SlideShare a Scribd company logo
1 of 61
Download to read offline
Academia-To-Industry Transition Of Search And Learning-
Based Software Engineering: Opportunities And Challenges
Bestoun S. Ahmed, Ph.D.
Department of Computer Science and Engineering
Faculty of Electrical Engineering
Czech Technical University in Prague
Karlovo náměstí 13, 121 35 Praha 2
@bestoon82
albeybes@fel.cvut.cz
www.bestoun.net
The Two Cultures
• 1959- the clash of "the two
cultures”
• The humanities and the
sciences.
• A similar cleavage between the
academy and industry.
–But why not most of them were not used by industry?
“We are publishing many great solutions for
nowadays’ problems”
Academia-Industry Collaboration
TRANSACTIONS OF THE AMERICAN CLINICAL AND CLIMATOLOGICAL ASSOCIATION, VOL. 113, 2002
ACAIDEMIC-INDUSTRIAL COLLABORATION: THE GOOD,
THE BAD, AND THE UGLY
JOSEPH B. MARTIN
BOSTON, MA
ABSTRACT
Academic-industrial collaborations and technology transfer have
over the past 50 years played an increasingly prominent role in the
biomedical sciences. University partnerships with industry can expe-
dite the availability of innovative drugs and other medical technolo-
gies, bringing both important public health benefits and a source of
income for universities and their faculty through a variety offinancial
arrangements. However, these relationships raise ethical concerns,
particularly when research involves human subjects in clinical trials.
Lapses in oversight of industry-sponsored clinical trials at universi-
ties, and especially patient deaths in a number of trials, have brought
these issues into the public spotlight and have led the federal govern-
ment to intensify its oversight of clinical research. The leadership of
Harvard Medical School convened a group of leaders in academic
Academia-Industry: Two Different Missions
• Academic mission:
• Education and discovery driven
by intellectual curiosity-what we
in academia like to regard as
"pure motives.”
• Industry mission:
• Translational research,
commercialization, and profit
making.
Breaches between the two missions?
Breach The Wall
• Science,Technology, Engineering, Computer Science (19th century onwards)
Patents > Licensing > Royalties
• Medical Devices and Biotechnology (1950 onwards)
• Basic science support from industry (1980 onwards)
ACADEMIC-INDUSTRIAL COLLABORATION
Academia Industry
Misi9n Mission
Education, discovery Translational research,
driven by intellectual commercialization,
curiosity: "pure motives" profit mailng
FIG. 1. The two cultures.
wall separating these two activities, rendering an increasingly porous
interface, admired by many and abhorred by some. The first breach in
the wall developed around technology, engineering, and computer sci-
ence, which led to a very deliberate process ofpatenting, licensing, and
royalty income by major research universities engaged in the funda-
mental sciences (Table 1). During the last 50 years, with the National
TABLE 1
Breaches in the Wall
1. Science, Technology, Engineering, Computer Science (19th century onwards)
Patents > Licensing > Royalties
2. Medical Devices and Biotechnology (1950-2000): same process
New ethical issue: agents or devices to be used in humans
228
Information and Software Technology 79 (2016) 106–127
Contents lists available at ScienceDirect
Information and Software Technology
journal homepage: www.elsevier.com/locate/infsof
Challenges and best practices in industry-academia collaborations in
software engineering: A systematic literature review
Vahid Garousia,b,∗
, Kai Petersenc
, Baris Ozkand
a
Software Engineering Research Group, Department of Computer Engineering, Hacettepe University, Ankara, Turkey
b
Maral Software Engineering Consulting Corporation, Calgary, Canada
c
Department of Software Engineering, School of Engineering, Blekinge Institute of Technology, Sweden
d
Department of Information Systems Engineering, Atilim University, Ankara, Turkey
a r t i c l e i n f o
Article history:
Received 31 December 2015
Revised 5 May 2016
Accepted 23 July 2016
Available online 30 July 2016
Keywords:
Software engineering
Industry-academia collaborations
Industry
Universities
Challenges
Success patterns
Best practices
Systematic literature review
a b s t r a c t
Context: The global software industry and the software engineering (SE) academia are two large commu-
nities. However, unfortunately, the level of joint industry-academia collaborations in SE is still relatively
very low, compared to the amount of activity in each of the two communities. It seems that the two
’camps’ show only limited interest/motivation to collaborate with one other. Many researchers and prac-
titioners have written about the challenges, success patterns (what to do, i.e., how to collaborate) and
anti-patterns (what not do do) for industry-academia collaborations.
Objective: To identify (a) the challenges to avoid risks to the collaboration by being aware of the chal-
lenges, (b) the best practices to provide an inventory of practices (patterns) allowing for an informed
choice of practices to use when planning and conducting collaborative projects.
Method: A systematic review has been conducted. Synthesis has been done using grounded-theory based
coding procedures.
Results: Through thematic analysis we identified 10 challenge themes and 17 best practice themes. A key
outcome was the inventory of best practices, the most common ones recommended in different contexts
*Study period 2003-2016
The ratio of authors from academia, industry, and
joint authorships (Published Research Papers)
SE topic areas of the projects studied
Types of contributions Research types
–Some companies have NDA and it is hard to convince them to publish papers
“Keep in mind !”
The Relationship: Experience
• Universities are changing the management vision to funding-oriented
management.
• Many requests for collaboration from academia.
• Less Response from industry.
AcademiaIndustry
Many Collaboration Requests
Less Collaboration Requests
Theory versus practice
Industry versus academe
Challenges To Collaborate In Soft Eng.
• There are many challenges mentioned by Garousi et. al*
• Most relevant to us:
• Results produced through research are not relevant for practice
• Researchers do not understand the relevant problems from an
industry point of view
• Different interests and objectives
• Different reward systems
• Lack of prior relationships between a company and academia
• Lack of resources due to high investment in terms of resources
• Licensing restrictions on tools
*V. Garousi, K. Petersen, and B. Ozkan,‘‘Challenges and best practices in industry-academia collaborations in software engineering:A systematic literature review,’’
Inf. Softw.Technol., vol. 79, pp. 106–127, Nov. 2016
Barriers: Our Experience
• Companies are interested in fast output.
• Academia is infested in publication, which is not preferred
generally by industry (information disclosing avoidance).
• Bureaucracy from the university side (especially the lawyers).
• Bureaucracy from the industry side, especially to find the
contact person.
• Size of the company is important.
• Collaboration and finding goes inversely with the company size.
Another chance to collaborate ?
“Search and Learning-based Software Engineering”
One Of My Meetings With Industry
- Look, I don’t think search or learning-based algorithms will contribute to our work.
- But, let us do the meeting.
• That is basically the end of the meeting from the beginning.
SBSE
“The application of metaheuristic search-based
optimization techniques to find near-optimal solutions
in software engineering problems”
• The term SBSE was first used in 2001 by Harman and Jones.
• However,  optimization to a software engineering problem was reported
by Webb Miller and David Spooner in 1976 in the area of software testing.
Search-based software engineering
Mark Harmana,*, Bryan F. Jonesb,1
a
Department of Information Systems and Computing, Brunel University, Uxbridge, Middlesex, UB8 3PH, UK
b
School of Computing, University of Glamorgan, Pontypridd, CF37 1DL, UK
Abstract
This paper claims that a new ®eld of software engineering research and practice is emerging: search-based software engineering. The paper
argues that software engineering is ideal for the application of metaheuristic search techniques, such as genetic algorithms, simulated
annealing and tabu search. Such search-based techniques could provide solutions to the dif®cult problems of balancing competing (and
some times inconsistent) constraints and may suggest ways of ®nding acceptable solutions in situations where perfect solutions are either
theoretically impossible or practically infeasible.
In order to develop the ®eld of search-based software engineering, a reformulation of classic software engineering problems as search
problems is required. The paper brie¯y sets out key ingredients for successful reformulation and evaluation criteria for search-based software
engineering. q 2001 Elsevier Science B.V. All rights reserved.
Keywords: Software engineering; Metaheuristic; Genetic algorithm
1. Introduction
Software engineers often face problems associated with
the balancing of competing constraints, trade-offs between
concerns and requirement imprecision. Perfect solutions are
often either impossible or impractical and the nature of the
problems often makes the de®nition of analytical algorithms
problematic.
Like other engineering disciplines, software engineering
is typically concerned with near optimal solutions or those
which fall within a speci®ed acceptable tolerance. It is
precisely these factors which make robust metaheuristic
search-based optimisation techniques readily applicable.
Metaheuristic algorithms, such as genetic algorithms
(GA) [17], simulated annealing [37] and tabu search [16]
have been applied successfully to a number of engineering
GA research and researchers have even received interest
from observers in the ®eld of social science. Though GA
practitioners may not agree with the ®ndings of sociologists
[18], it is an indication of the wide appreciation of the
signi®cance of these search-based technologies that they
should have penetrated the collective consciousness of
even `non-technical' disciplines such as social science.
However, the discipline of software engineering appears
to be unique with regard to the application of genetic algo-
rithms (and similar search-based, metaheuristic optimisa-
tion techniques); metaheuristic algorithms have received
comparatively little attention from software engineers in
comparison with that which they have received from
researchers and practitioners in the more established ®elds
of engineering.
Information and Software Technology 43 (2001) 833±839
www.elsevier.com/locate/infsof
The first paper to use a meta-heuristic search technique was
probably the work of Boyer, Elspas and Levitt on the SELECT
system [16]. The paper is remarkable in many ways. Consider
the following paragraph, quoted from the paper:
“The limitation of the above algorithms to linear
combinations is an unacceptable, and vexing, one.
For example, they could not handle an inequality like
X∗Y +10∗Z− W ≥ 5 among its constraints, unless
one were prepared to assign to X a trial value,
and then attempt a solution (assuming the other
inequalities are linear). We therefore considered
various alternatives that would not be subject to this
limitation. The most promising of these alternatives
appears to be a conjugate gradient algorithm (‘hill
climbing’ program) that seeks to minimise a poten-
tial function constructed from the inequalities.” [16]
Here we can see, not only the first use of computational
search (hill climbing) in software engineering, but also a
hint at the idea (assignment of concrete values) that was
subsequently to become Dynamic Symbolic Execution (DSE)
[21]. Within this single paragraph we therefore may arguably
find the origins of both DSE and SBST (and, by extension,
SBSE too).
The SELECT paper is also remarkable in its sober and
prescient assessment of the relative merits of testing and
verification. Shortly after its publication, these two closely
related research communities entered into a protracted and
unhelpful ‘feud’ that generated a great deal more heat than
light [29], [31], [35], [60]. Fortunately, we have more recently
witnessed an accommodation between the two communities
[61], and greater degree of welcome collaboration at their
intersection [59]. We really ought to ruefully reflect on the
delay in this rapprochement given the ‘understanding’ already
Fig. 1: Cumulative number of Search Based Software Testing
papers. As can be seen, the overall trend continues to suggest
a polynomial yearly rise in the number of papers, highlighting
the breadth of interest and strong health of SBST.
Unlike Boyer et al. [16], Miller and Spooner used concrete
execution of the program rather than symbolic execution,
making their approach more similar to the techniques that
ultimately became SBST, while the work of Boyer et al.
followed a closely-related (but different) evolutionary path,
which ultimately led to DSE. Current research develops both
these techniques, and also hybrids that combine the best
features of both [9], [63], [71], [110].
It appears that SBST research lay dormant for at approxi-
mately a decade until the work of Korel [68], which introduced
a practical test data generation approach, the Alternating
http://crestweb.cs.ucl.ac.uk/resources/sbse_repository/ 2/2
The number of publications in the year from 1976 t0 2012
Fig. 2: The changing ratio of SBSE papers that are SBST
papers. Initially, SBST dominated SBSE. Over the years, this
te
th
‘r
d
te
a
y
e
ri
th
fo
te
c
m
p
*Y. J. M. Harman andY. Zhang,“Achievements, open problems and challenges for search based software testing,” in Proc. 8th IEEE Int. Conf. Softw.
Testing,VerificationValidation,Apr. 2015, pp. 1–12.
How It Works? Simply
• There are few general steps that are common among all
the tools and algorithms:
1. Choose a search-based optimization technique.
2. Decide on an objective function (fitness function)
that will be a benchmark for the optimization
process.
3. Search for the best candidate among a list of
candidates.
4. Based on the chosen optimization technique, update
the list of candidates.
5. Iterate until no better candidates found.
PSO As An Example
• PSO = Particle Swarm Optimization
• "Particles" fly in this hyperspace and try to find the global minima/maxima, their movement
being governed by a simple mathematical equation.
( ) ( )1 1 2 , 3 ,
1 1
t t i t t g t t
t t t
v c v c p x c p x
x x v
+
+ +
ì = + - + -ï
í
= +ïî pi
vt
xt pg
xt+1
particle’s itself
particle’s personal best
particle’s neighbours best
,
,
1 2 3
: velocity at time step
: position at time step
: best previous position, at time step
: best previous best, at time step ,
, , : co
neighbour'
gnitive/social
s
t
t
i t
g t
v t
x t
p t
p t
c c c
=
=
=
=
= confidence coefficients
Your problem must be a non-deterministic problem!
“Remember”
• Even for the same input, can exhibit different behaviors on different runs, as
opposed to a deterministic algorithm.
• There should not be an exact algorithm to solve your problem.
• Many algorithms using Search-based algorithms for deterministic problems
nowadays just to catch the attention and use some fancy representations.
• But ………. They are wrong algorithms
1- The problem representation
“Two important components”
2- The Fitness Function
Fitness Function: The Key Ingredient
• In Software Engineering: the fitness function converts some
requirement of the system into some measurable number.
Requirements
Functional
Non-functional
Like the coverage
of some attribute
of the system
-Security
-Usability
-Safety
-Efficiency
-Energy Consumption
-Scalability
-Availability
Some Functional Applications
Test Case Generation And Minimization
• 34 switches = 234 = 1.7 x 1010 possible inputs = 1.7 x 1010 tests?
• Every possible combination of inputs? Which interactions cause faults?
• How many test cases
we need to test?
The combinatorial interaction approach
Payment Server Smart Phone Web Server User Browser Business Database
Master iPhone iPlanet Chrome SQL
Visa Blackberry Apache Explorer Oracle
Firefox Access
Test Case Minimization Approach
Red color tuples 25% of the t-tuples
Green color tuples 50% of the t-tuples
Blue color tuples 75% of the t-tuples
Brown color tuples 100% of the t-tuples
How To Use The Fitness Function?
PSTG Strategy - How to use
fitness function to choose
best test case that cover most
of the interactions
Some applications of combinatorial interaction
GUI Testing
The use of event based modeling in GUI testing
Configuration Generation
nest to be added to the FTS
es in the d-tuples list (Step
g as n-tuples remain in the
dure of the strategy.
enerated test suite, the strat-
tudy on a reliable artifact
correctness of the strategy
m. The generated test suite
internal structure of the ar-
ation, the test suite is filtered
d detected faults. In making
ency of the first stage (i.e.,
at of other strategies.
licly as tools to be down-
strategies are unavailable
n the same environment is
n. The proposed strategy is
gies, namely Jenny, TConfig,
experimental environment
ws 7 operating system, 64-
of RAM. The algorithms are
ents
ed by the size of the con-
pared with those strategies
and criteria to be converted into a weighted number. Each criteri-
on has an effect on the final result, which decides the rank and
monthly wage of the officer. The final number is the resulting point.
The program is selected because it has a nontrivial code base and
different configurations. Fig. 12 shows the main window of the
program.
The program regards different configurations as input factors. Each
input factor has different levels. For example, the user can choose
“No Degree,” “Primary,” “Secondary,” “Diploma,” “Bachelor,” “Master,”
and “Doctorate” levels for the “Degree” factor. Table 3 summarizes
the factors and levels for the program.
To this end, the input configuration of the program can be rep-
resented by one factor with seven levels, one factor with six levels,
eight factors with two levels each, and two factors with three levels
each. Thus, this input configuration can be notated in an MCA no-
tation as MCA (N; d, 71
61
28
32
). We need 96,768 test cases to test
Table 3
Summary of the input factors and levels for the case study program.
No. Factors Levels
1 Degree [No Degree, Primary, Secondary, Diploma,
Bachelor, Master, Doctor]
2 Children [Non, 1, 2, 3, 4, More_than_4]
3 Read [Checked, unchecked]
4 Write [Checked, unchecked]
5 Speak [Checked, unchecked]
6 Understand [Checked, unchecked]
7 New graduate [Checked, unchecked]
8 Experience [Checked, unchecked]
9 English [Checked, unchecked]
10 Disability [Checked, unchecked]
11 Marital status [Single, Married, Widow]
12 Resident [Local, Outsider, Foreigner]
Another Application
Control Engineering
Application of combinatorial test design
• Application of Combinatorial Interaction PID
ControllerTuning.
Generating Configuration Tests For SPL
Avocado tool + CIT plugin
• The goal is to add the CIT plugin to the other plugin set
Development
team
Avocado
+
CIT plugin
Quality Engineering
team
CIT
Test
cases
Problem
specification
CIT
Context and Structure
"Varianter" object (all combinations)
"Varianter" object (resulting combinations)
- CIT is hooked in the Avocado Job, after
the action of the YAML_TO_MUX plugin,
before the call to the Avocado Runner.
- The hook is triggered by the command
line option "--combinatorial N".
- CIT plugin receives the Varianter object,
serializes it, filters out the unneeded
variants, creates a new Varianter object
out of the remaining variants and returns
the object to the Avocado Job.
Avocado Multiplexer
Avocado Multiplexer + CIT : Benefits
• Consider the example in the left. There are
three branches, machine(5 leaves), image(6
leaves) and architecture(29 leaves).
• Basically, with the current Avocado
approach, there will be (5x6x29= 870) test
cases.
• Using our CIT algorithm, we can reduce the
number of test cases to 175 while keeping
the test coverage on very similar level.
• By testing the effectiveness of CIT, we found
that the reduced test set is as effective as the
exhaustive test set (i.e., 870 test cases)
Input Parameter Analysis
*D. Richard Kuhn , Raghu N. Kacker ,Yu Lei, Introduction to CombinatorialTesting, Chapman & Hall/CRC, 2013
Other Applications
• Input testing: A systematic sampling of the input
parameters for the system-under-test instead of random
selection.
• Model-based testing: Generating test cases for
many models, like class diagrams, state charts.
• Test suite prioritization: Arrange the test cases in
the most effective way.
Software Module Clustering
Preamble
• In line with the customer demands for new functionalities,
software line of codes (LOCs) increases tremendously in the last
10 years.
• From ->kilobytes -> Megabytes -> Gigabytes->Terabytes
…………
Software Becomes More Complex
• How to do maintenance?
• Test engineers need to test more and more codes!!
Clustering
• Clustering = the process of organizing objects into groups
whose members are similar in some way.
• A cluster is therefore a collection of objects which are “similar” between
them and are “dissimilar” to the objects belonging to other clusters.
Clustering And Impact Analysis
Objectives
• Maximizing cohesion
• Describe functional strength
of a module i.e. similar
functions are grouped into a
specific module.
• Cohesion is a desirable design
component attribute as when
a change has to be made, it is
localized in a single
component.
• Minimizing coupling
• Describe interdependencies among
modules
• Loose coupling means component
changes are unlikely to affect other
components
• Shared variables or control information
exchange lead to tight coupling
• Loose coupling can be achieved by
state decentralization (as in objects)
and component communication via
parameters or message passing
Problems At Hand
Module Dependency Graph (Mdg)
Modularization Quality And Modularization Factor
Non-Functional Properties
Energy Optimization
• Automatically transforming the color scheme of a mobile web application by rewrites the
server side code and templates of a web application so that the resulting web application
generates pages that are more energy efficient when displayed on a smartphone.
Figure 1: the architecture of Nyx
Figure 2: Example HARG for Program 1.
To build the HARG, our approach parses the HOG to ag-
gregate the individual characters in each node’s FSA into
HTML tags. The traversal begins by traversing all of the
edges in the FSA associated with the root node of the HOG
and then following all of the outgoing edges of the root node
and repeating this process until all nodes in the HOG have
been traversed. During the traversal, the approach main-
tains a parse state that allows it to determine if it is cur-
rently parsing an HTML tag, attributes, or text. When the
More broadly, the techniques for obtaining the HOG can
lead to an over approximation of an application’s possible
HTML pages. In turn, this can lead to the identification
of spurious visual relationships that correspond to infeasi
ble paths. This does not cause a problem for the approach
as this merely introduces additional color relationship con
straints that must be accounted for while generating the
Color Transformation Scheme in Section 5.3.
5. COLOR TRANSFORMATION
In the second phase, the approach calculates the new en
ergy e cient color scheme for the application – the Color
Transformation Scheme (CTS). There are two requirements
for the CTS, it must: (1) use energy e cient colors as the
basis for the new color scheme, and (2) maintain the color re
lationships between neighboring HTML elements. The first
requirement serves the general goal of the approach and the
second ensures that the color-transformed pages are readable
and, ideally, as visually appealing as the original pages. To
address the first requirement, the CTS should replace large
light colored background areas with dark colors (preferably
black), as mentioned in Section 2. To address the second
*Making Web Applications More Energy Efficient for OLED Smartphones , Ding Li, Angelica Huyen Tran, William G.J. Halfond, ICSE 2014,
HTML Output Graph (HOG)
HTMLAdjacency Relationship Graph (HARG)
Color Conflict Graph (CCG)
Color Transformation Scheme (CTS)
Learning-Based Software Engineering
• The use of machine learning technique to learn from
some set of data or information retrieval system.
A Learning-Based Software Engineering Environment
Sidney C.Bailin, Robert H. Gattis, and Walt Truszkowski*
Abstract
We describe an initial prototype of a software
engineering environment that combines case-based
reasoning (CBR) and explanation-based learning (EBL)
functions. CBR and EBL are used to evolve the
environment's understanding of softwareprinciples as it is
used. The case base serves as a repository for reusable
solutions to software engineering problems. New
solutions are synthesized from the case base through a
process of adaptation,evaluation,and repair. Whena new
solution is returned, the user has the option of rnodioing
it through a series of primitive edit operations. The
environment is capable of abstracting from these
operations using explanation-based generalization, and
synthesizing a new repair rule on the basis of the
abstraction. We have successfully taught the environment
a non-trivial design repair rule by means of a single
example, and have observed the environment apply this
learned rule to the solution of a new inputproblem.
expertiseon engineeringsoftwaregrowsand developsover
time. A knowledge-based computer system capable of
offloading non-trivial engineering tasks from the human
must also possess such an ability. If a KBSEE is ever to
be able to move beyond "toy" problems, it must be able
to learn.
Our analysis of alternative machine learning
approaches led us to propose a combination of two
techniques:case-based reasoning (CBR) and explanation-
based learning (EBL). Case-based reasoning is, in
essence, an approach to reusing previously acquired
knowledge. New situtations are recognized as having
characteristics in common with situations previously
encountered. Whatever has been learned through the
previous encounter is then applied, with some adaptation
if necessary, to the new situation. In the KBSEE, these
situations(cases) are softwareengineeringproblems (for
example, a set of requirements), and the reusable
knowledge consists of solutionsto such problems (e.g., a
design meetingtherequirements).
Explanation-basedlearning is an approach to learning
Two Main Applications
• Classification learning
• Learning from crowdsourcing
• Adaptive feedback system learning
SUTAlgo
Learn
Ver.
Applied Soft Computing 62 (2018) 579–591
Contents lists available at ScienceDirect
Applied Soft Computing
journal homepage: www.elsevier.com/locate/asoc
Learning to classify software defects from crowds: A novel approach
Jerónimo Hernández-Gonzáleza,∗
, Daniel Rodriguezc
, I˜naki Inzaa
, Rachel Harrisond
,
Jose A. Lozanoa,b
a
Department of Computer Science and Artificial Intelligence, University of the Basque Country UPV/EHU, Donostia, Spain
b
Basque Center for Applied Mathematics BCAM, Bilbao, Spain
c
Department of Computer Science, University of Alcala, Madrid, Spain
d
Department of Computing, Oxford Brookes University, Oxford, UK
a r t i c l e i n f o
Article history:
Received 1 April 2017
Received in revised form 27 October 2017
Accepted 31 October 2017
Available online 7 November 2017
Keywords:
Learning from crowds
Orthogonal defect classification
Missing ground truth
Bayesian network classifiers
a b s t r a c t
In software engineering, associating each reported defect with a category allows, among many other
things, for the appropriate allocation of resources. Although this classification task can be automated
using standard machine learning techniques, the categorization of defects for model training requires
expert knowledge, which is not always available. To circumvent this dependency, we propose to apply
the learning from crowds paradigm, where training categories are obtained from multiple non-expert
annotators (and so may be incomplete, noisy or erroneous) and, dealing with this subjective class infor-
mation, classifiers are efficiently learnt. To illustrate our proposal, we present two real applications of
the IBM’s orthogonal defect classification working on the issue tracking systems from two different real
domains. Bayesian network classifiers learnt using two state-of-the-art methodologies from data labeled
by a crowd of annotators are used to predict the category (impact) of reported software defects. The
considered methodologies show enhanced performance regarding the straightforward solution (major-
ity voting) according to different metrics. This shows the possibilities of using non-expert knowledge
aggregation techniques when expert knowledge is unavailable.
© 2017 Elsevier B.V. All rights reserved.
Table 3
Examples of defects and the corresponding labelings provided by the annotators.
Summary Description L1 L2 L3 L4 L5
Compendium dataset
Error Launching
Compendium LD after
install
Hi team, Error message launching
Compendium LD after initial
install: Java Virtual Machine
Launcher Could not find the main
class:
com.compendium.ProjectCompendium.
Program will exit. I have run
through the suggestion on the
forums of adding the path to javaw
in the.bat, and verified the path
through a command prompt is
successful, same error. Any other
tips? Regards, Eric
Installability Other Installability Installability Installability
Spell Checker Add a spelling checker to
Compendium with the ability to
switch on and off auto-spell
checking.
Requirements Requirements Requirements Requirements Requirements
Can small icons also work
with images? Make small
images?
Right now when you choose small
icons, it shrinks the normal
Compendium icons but not any
reference node images, so they
stay really big. Can we add an
option to shrink those
proportionally as well?
Requirements Requirements Usability Requirements Other
Text find/replace Global search/find/replace
functionality. Maybe coupled with
existing search parameters. Ability
to change text in found nodes
without having to open the nodes,
edit the label/detail, etc.
Requirements Usability Usability Requirements Requirements
Mozilla dataset
NSS autoconf does not
include IRIX
–enable-crypto does not work on
IRIX as security/nss/confi-gure.in
does not define XP UNIX and
friends on IRIX.
Requirements Reliability Maintenance Requirements Maintenance
Mozilla automatically
checks the “Reassign bug
to” radio button
Mozilla automatically checks the
“Reassing bug to” radio button in
Bugzilla causing unintentional
changes to bugs. Tested with
win32 051404 mozilla win32 build
on NT. More to come.
Other Other Reliability Reliability Reliability
Installation failed with
error -214 due to empty
flash.xpi
seen on mac commercial build
2001-05-09-04-trunk. The
installers, both full and stub, failed
with a -214 error. Though the
installation “appears” complete,
when launched, it crashes at the
Installability Installability Installability Installability Installability
roblem. A graphical description of the training process is
n Fig. 3. The use of the training crowd-labeled data by the
earning techniques is compared.
se experiments were carried out using our own imple-
n of the different learning algorithms and evaluation
. As our developments are written in Java, we can make
of several data management features currently imple-
n the popular software Weka [52]. In this way, the text
by users to describe defects (in two text fields, summary
ption) was processed. In a pre-processing stage, standard
niques have been used to extract a relevant set of vari-
m the text fields and transform the original database into
which can be handled by ML techniques. Specifically, the
tringToWordVector filter implemented in Weka [52] was
p-words were removed based on Rainbow [53], text was
to lowercase; the iterated version of the Lovins stem-
was applied as well as an alphabetic tokenizer where
the relatively recent emergence of the learning
paradigm, the model evaluation in this scenario is st
explored. In this paper, the evaluation strategy foll
on the same idea that confers its characteristic rob
majority voting: the combination of multiple indep
ments [9,40]. That is, the mean value of the perfor
calculated using the labels of one annotator at a t
truth is considered. In practice, all the experiments
are evaluated as follows (see Fig. 4): (1) after mode
performance of the model is estimated using the
each labeler, one at a time, as ground truth, and (2) t
of all the estimates is the final performance value.
imental results are obtained with a 10 × 5-fold cr
procedure [55].
SB And LB Combined
Learn from your testing output !
What is next ?
SUT
Search&Learn-based
Testing Strategy
System’s input Test output
On The Challenges Of Search Algorithm
• Changing the optimization algorithm will not necessary
change the output to better. (My Experience)
• It is the coverage criteria and the finest function that lead
to better results. (My Experience)
SBSE And LBSE: Challenges With The Industry
• The non-deterministic output due to the randomness.
• Expensive computation.
• Modeling the system or the problem space.
• Realize that you need one of them.
Probably we need to change our way of
communication with industry

More Related Content

What's hot

Commercialisation Brochure
Commercialisation BrochureCommercialisation Brochure
Commercialisation Brochure
Marcia Taylor
 
Bill Wicksteed
Bill WicksteedBill Wicksteed
Bill Wicksteed
gratsaniti
 
Arizona Association for Economic Development, Technology Workforce Survey
Arizona Association for Economic Development, Technology Workforce SurveyArizona Association for Economic Development, Technology Workforce Survey
Arizona Association for Economic Development, Technology Workforce Survey
aztechcouncil
 
First day on writing well
First day  on writing wellFirst day  on writing well
First day on writing well
jrhtx2
 

What's hot (18)

Industry-Academia Linkages
Industry-Academia LinkagesIndustry-Academia Linkages
Industry-Academia Linkages
 
Electronics and communication engineering
Electronics and communication engineeringElectronics and communication engineering
Electronics and communication engineering
 
Bioscience Presentation For Business Services Staff
Bioscience Presentation For Business Services StaffBioscience Presentation For Business Services Staff
Bioscience Presentation For Business Services Staff
 
Wois Feburary 2010
Wois Feburary 2010Wois Feburary 2010
Wois Feburary 2010
 
Calit2 – Increasing Interaction Between Industry and University Researchers
Calit2 – Increasing Interaction Between Industry and University ResearchersCalit2 – Increasing Interaction Between Industry and University Researchers
Calit2 – Increasing Interaction Between Industry and University Researchers
 
What does innovation today tell us about tomorrow?
What does innovation today tell us about tomorrow?What does innovation today tell us about tomorrow?
What does innovation today tell us about tomorrow?
 
Commercialisation Brochure
Commercialisation BrochureCommercialisation Brochure
Commercialisation Brochure
 
Creative destrution, Economic Feasibility, and Creative Destruction: The Case...
Creative destrution, Economic Feasibility, and Creative Destruction: The Case...Creative destrution, Economic Feasibility, and Creative Destruction: The Case...
Creative destrution, Economic Feasibility, and Creative Destruction: The Case...
 
Lec 06
Lec 06Lec 06
Lec 06
 
A SURVEY OF EMPLOYERS’ NEEDS FOR TECHNICAL AND SOFT SKILLS AMONG NEW GRADUATES
A SURVEY OF EMPLOYERS’ NEEDS FOR TECHNICAL AND SOFT SKILLS AMONG NEW GRADUATES A SURVEY OF EMPLOYERS’ NEEDS FOR TECHNICAL AND SOFT SKILLS AMONG NEW GRADUATES
A SURVEY OF EMPLOYERS’ NEEDS FOR TECHNICAL AND SOFT SKILLS AMONG NEW GRADUATES
 
Bill Wicksteed
Bill WicksteedBill Wicksteed
Bill Wicksteed
 
A SURVEY OF EMPLOYERS’ NEEDS FOR TECHNICAL AND SOFT SKILLS AMONG NEW GRADUATES
A SURVEY OF EMPLOYERS’ NEEDS FOR TECHNICAL AND SOFT SKILLS AMONG NEW GRADUATESA SURVEY OF EMPLOYERS’ NEEDS FOR TECHNICAL AND SOFT SKILLS AMONG NEW GRADUATES
A SURVEY OF EMPLOYERS’ NEEDS FOR TECHNICAL AND SOFT SKILLS AMONG NEW GRADUATES
 
Arizona Association for Economic Development, Technology Workforce Survey
Arizona Association for Economic Development, Technology Workforce SurveyArizona Association for Economic Development, Technology Workforce Survey
Arizona Association for Economic Development, Technology Workforce Survey
 
2139EPS09 L3
2139EPS09 L32139EPS09 L3
2139EPS09 L3
 
How and When do New Technologies Become Economically Feasible
How and When do New Technologies Become Economically FeasibleHow and When do New Technologies Become Economically Feasible
How and When do New Technologies Become Economically Feasible
 
First day on writing well
First day  on writing wellFirst day  on writing well
First day on writing well
 
The essence of academic entrepreneurship application to chinhoyi university o...
The essence of academic entrepreneurship application to chinhoyi university o...The essence of academic entrepreneurship application to chinhoyi university o...
The essence of academic entrepreneurship application to chinhoyi university o...
 
2012.06.12 Research on Academic Entrepreneurship: Lessons Learnt. Part 1
2012.06.12 Research on Academic Entrepreneurship: Lessons Learnt. Part 12012.06.12 Research on Academic Entrepreneurship: Lessons Learnt. Part 1
2012.06.12 Research on Academic Entrepreneurship: Lessons Learnt. Part 1
 

Similar to Academia-to-Industry Transition of Search and Learning- Based Software Engineering: Opportunities and Challenges

Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software Engineering
Per Runeson
 
The Factors That Influence The Adoption Of New Technologies
The Factors That Influence The Adoption Of New TechnologiesThe Factors That Influence The Adoption Of New Technologies
The Factors That Influence The Adoption Of New Technologies
Erika Nelson
 
ICIC 2013 Conference Proceedings Ricardo Eito Brun Uni Madrid
ICIC 2013 Conference Proceedings Ricardo Eito Brun Uni MadridICIC 2013 Conference Proceedings Ricardo Eito Brun Uni Madrid
ICIC 2013 Conference Proceedings Ricardo Eito Brun Uni Madrid
Dr. Haxel Consult
 

Similar to Academia-to-Industry Transition of Search and Learning- Based Software Engineering: Opportunities and Challenges (20)

Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software Engineering
 
Marco Tirelli - Open Innovation in the Era of the Internet of Things
Marco Tirelli - Open Innovation in the Era of the Internet of ThingsMarco Tirelli - Open Innovation in the Era of the Internet of Things
Marco Tirelli - Open Innovation in the Era of the Internet of Things
 
Engineering Ethics Essay
Engineering Ethics EssayEngineering Ethics Essay
Engineering Ethics Essay
 
Interactions Academiy-Firms - Caso do México
Interactions Academiy-Firms - Caso do MéxicoInteractions Academiy-Firms - Caso do México
Interactions Academiy-Firms - Caso do México
 
Koch taftie-measuring the effects of research
Koch taftie-measuring the effects of researchKoch taftie-measuring the effects of research
Koch taftie-measuring the effects of research
 
OSFair2017 Workshop | Text mining
OSFair2017 Workshop | Text miningOSFair2017 Workshop | Text mining
OSFair2017 Workshop | Text mining
 
PROMISE 2011: Seven Habits of High Impactful Empirical Software Engineers (La...
PROMISE 2011: Seven Habits of High Impactful Empirical Software Engineers (La...PROMISE 2011: Seven Habits of High Impactful Empirical Software Engineers (La...
PROMISE 2011: Seven Habits of High Impactful Empirical Software Engineers (La...
 
The Innovation Engine for Team Building – The EU Aristotele Approach From Ope...
The Innovation Engine for Team Building – The EU Aristotele Approach From Ope...The Innovation Engine for Team Building – The EU Aristotele Approach From Ope...
The Innovation Engine for Team Building – The EU Aristotele Approach From Ope...
 
Analysis and Design of Information Systems
Analysis and Design of Information SystemsAnalysis and Design of Information Systems
Analysis and Design of Information Systems
 
Democratization of Manufacturing
Democratization of ManufacturingDemocratization of Manufacturing
Democratization of Manufacturing
 
Paveway Democratization of Manufacturing Workshop Report
Paveway Democratization of Manufacturing Workshop ReportPaveway Democratization of Manufacturing Workshop Report
Paveway Democratization of Manufacturing Workshop Report
 
An Engineering-to-Biology Thesaurus for Engineering Design.pdf
An Engineering-to-Biology Thesaurus for Engineering Design.pdfAn Engineering-to-Biology Thesaurus for Engineering Design.pdf
An Engineering-to-Biology Thesaurus for Engineering Design.pdf
 
The Factors That Influence The Adoption Of New Technologies
The Factors That Influence The Adoption Of New TechnologiesThe Factors That Influence The Adoption Of New Technologies
The Factors That Influence The Adoption Of New Technologies
 
Challenges in SE: Knowledge reuse
Challenges in SE: Knowledge reuseChallenges in SE: Knowledge reuse
Challenges in SE: Knowledge reuse
 
Commercialization of Science: What has changed and what can be done to revit...
Commercialization of Science:  What has changed and what can be done to revit...Commercialization of Science:  What has changed and what can be done to revit...
Commercialization of Science: What has changed and what can be done to revit...
 
2012.06.13 Economic Growth and Academic Entrepreneurship: Lessons and Implica...
2012.06.13 Economic Growth and Academic Entrepreneurship: Lessons and Implica...2012.06.13 Economic Growth and Academic Entrepreneurship: Lessons and Implica...
2012.06.13 Economic Growth and Academic Entrepreneurship: Lessons and Implica...
 
chapter 2 by YAN LIU
chapter 2 by YAN LIUchapter 2 by YAN LIU
chapter 2 by YAN LIU
 
A Design Theory For Software Engineering
A Design Theory For Software EngineeringA Design Theory For Software Engineering
A Design Theory For Software Engineering
 
scenarios and Foresight
scenarios and Foresightscenarios and Foresight
scenarios and Foresight
 
ICIC 2013 Conference Proceedings Ricardo Eito Brun Uni Madrid
ICIC 2013 Conference Proceedings Ricardo Eito Brun Uni MadridICIC 2013 Conference Proceedings Ricardo Eito Brun Uni Madrid
ICIC 2013 Conference Proceedings Ricardo Eito Brun Uni Madrid
 

Recently uploaded

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Recently uploaded (20)

Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 

Academia-to-Industry Transition of Search and Learning- Based Software Engineering: Opportunities and Challenges

  • 1. Academia-To-Industry Transition Of Search And Learning- Based Software Engineering: Opportunities And Challenges Bestoun S. Ahmed, Ph.D. Department of Computer Science and Engineering Faculty of Electrical Engineering Czech Technical University in Prague Karlovo náměstí 13, 121 35 Praha 2 @bestoon82 albeybes@fel.cvut.cz www.bestoun.net
  • 2. The Two Cultures • 1959- the clash of "the two cultures” • The humanities and the sciences. • A similar cleavage between the academy and industry.
  • 3. –But why not most of them were not used by industry? “We are publishing many great solutions for nowadays’ problems”
  • 4. Academia-Industry Collaboration TRANSACTIONS OF THE AMERICAN CLINICAL AND CLIMATOLOGICAL ASSOCIATION, VOL. 113, 2002 ACAIDEMIC-INDUSTRIAL COLLABORATION: THE GOOD, THE BAD, AND THE UGLY JOSEPH B. MARTIN BOSTON, MA ABSTRACT Academic-industrial collaborations and technology transfer have over the past 50 years played an increasingly prominent role in the biomedical sciences. University partnerships with industry can expe- dite the availability of innovative drugs and other medical technolo- gies, bringing both important public health benefits and a source of income for universities and their faculty through a variety offinancial arrangements. However, these relationships raise ethical concerns, particularly when research involves human subjects in clinical trials. Lapses in oversight of industry-sponsored clinical trials at universi- ties, and especially patient deaths in a number of trials, have brought these issues into the public spotlight and have led the federal govern- ment to intensify its oversight of clinical research. The leadership of Harvard Medical School convened a group of leaders in academic
  • 5. Academia-Industry: Two Different Missions • Academic mission: • Education and discovery driven by intellectual curiosity-what we in academia like to regard as "pure motives.” • Industry mission: • Translational research, commercialization, and profit making. Breaches between the two missions?
  • 6. Breach The Wall • Science,Technology, Engineering, Computer Science (19th century onwards) Patents > Licensing > Royalties • Medical Devices and Biotechnology (1950 onwards) • Basic science support from industry (1980 onwards) ACADEMIC-INDUSTRIAL COLLABORATION Academia Industry Misi9n Mission Education, discovery Translational research, driven by intellectual commercialization, curiosity: "pure motives" profit mailng FIG. 1. The two cultures. wall separating these two activities, rendering an increasingly porous interface, admired by many and abhorred by some. The first breach in the wall developed around technology, engineering, and computer sci- ence, which led to a very deliberate process ofpatenting, licensing, and royalty income by major research universities engaged in the funda- mental sciences (Table 1). During the last 50 years, with the National TABLE 1 Breaches in the Wall 1. Science, Technology, Engineering, Computer Science (19th century onwards) Patents > Licensing > Royalties 2. Medical Devices and Biotechnology (1950-2000): same process New ethical issue: agents or devices to be used in humans 228
  • 7. Information and Software Technology 79 (2016) 106–127 Contents lists available at ScienceDirect Information and Software Technology journal homepage: www.elsevier.com/locate/infsof Challenges and best practices in industry-academia collaborations in software engineering: A systematic literature review Vahid Garousia,b,∗ , Kai Petersenc , Baris Ozkand a Software Engineering Research Group, Department of Computer Engineering, Hacettepe University, Ankara, Turkey b Maral Software Engineering Consulting Corporation, Calgary, Canada c Department of Software Engineering, School of Engineering, Blekinge Institute of Technology, Sweden d Department of Information Systems Engineering, Atilim University, Ankara, Turkey a r t i c l e i n f o Article history: Received 31 December 2015 Revised 5 May 2016 Accepted 23 July 2016 Available online 30 July 2016 Keywords: Software engineering Industry-academia collaborations Industry Universities Challenges Success patterns Best practices Systematic literature review a b s t r a c t Context: The global software industry and the software engineering (SE) academia are two large commu- nities. However, unfortunately, the level of joint industry-academia collaborations in SE is still relatively very low, compared to the amount of activity in each of the two communities. It seems that the two ’camps’ show only limited interest/motivation to collaborate with one other. Many researchers and prac- titioners have written about the challenges, success patterns (what to do, i.e., how to collaborate) and anti-patterns (what not do do) for industry-academia collaborations. Objective: To identify (a) the challenges to avoid risks to the collaboration by being aware of the chal- lenges, (b) the best practices to provide an inventory of practices (patterns) allowing for an informed choice of practices to use when planning and conducting collaborative projects. Method: A systematic review has been conducted. Synthesis has been done using grounded-theory based coding procedures. Results: Through thematic analysis we identified 10 challenge themes and 17 best practice themes. A key outcome was the inventory of best practices, the most common ones recommended in different contexts *Study period 2003-2016
  • 8. The ratio of authors from academia, industry, and joint authorships (Published Research Papers) SE topic areas of the projects studied Types of contributions Research types
  • 9. –Some companies have NDA and it is hard to convince them to publish papers “Keep in mind !”
  • 10. The Relationship: Experience • Universities are changing the management vision to funding-oriented management. • Many requests for collaboration from academia. • Less Response from industry. AcademiaIndustry Many Collaboration Requests Less Collaboration Requests
  • 12. Challenges To Collaborate In Soft Eng. • There are many challenges mentioned by Garousi et. al* • Most relevant to us: • Results produced through research are not relevant for practice • Researchers do not understand the relevant problems from an industry point of view • Different interests and objectives • Different reward systems • Lack of prior relationships between a company and academia • Lack of resources due to high investment in terms of resources • Licensing restrictions on tools *V. Garousi, K. Petersen, and B. Ozkan,‘‘Challenges and best practices in industry-academia collaborations in software engineering:A systematic literature review,’’ Inf. Softw.Technol., vol. 79, pp. 106–127, Nov. 2016
  • 13. Barriers: Our Experience • Companies are interested in fast output. • Academia is infested in publication, which is not preferred generally by industry (information disclosing avoidance). • Bureaucracy from the university side (especially the lawyers). • Bureaucracy from the industry side, especially to find the contact person. • Size of the company is important. • Collaboration and finding goes inversely with the company size.
  • 14. Another chance to collaborate ? “Search and Learning-based Software Engineering”
  • 15. One Of My Meetings With Industry - Look, I don’t think search or learning-based algorithms will contribute to our work. - But, let us do the meeting. • That is basically the end of the meeting from the beginning.
  • 16. SBSE “The application of metaheuristic search-based optimization techniques to find near-optimal solutions in software engineering problems”
  • 17. • The term SBSE was first used in 2001 by Harman and Jones. • However,  optimization to a software engineering problem was reported by Webb Miller and David Spooner in 1976 in the area of software testing. Search-based software engineering Mark Harmana,*, Bryan F. Jonesb,1 a Department of Information Systems and Computing, Brunel University, Uxbridge, Middlesex, UB8 3PH, UK b School of Computing, University of Glamorgan, Pontypridd, CF37 1DL, UK Abstract This paper claims that a new ®eld of software engineering research and practice is emerging: search-based software engineering. The paper argues that software engineering is ideal for the application of metaheuristic search techniques, such as genetic algorithms, simulated annealing and tabu search. Such search-based techniques could provide solutions to the dif®cult problems of balancing competing (and some times inconsistent) constraints and may suggest ways of ®nding acceptable solutions in situations where perfect solutions are either theoretically impossible or practically infeasible. In order to develop the ®eld of search-based software engineering, a reformulation of classic software engineering problems as search problems is required. The paper brie¯y sets out key ingredients for successful reformulation and evaluation criteria for search-based software engineering. q 2001 Elsevier Science B.V. All rights reserved. Keywords: Software engineering; Metaheuristic; Genetic algorithm 1. Introduction Software engineers often face problems associated with the balancing of competing constraints, trade-offs between concerns and requirement imprecision. Perfect solutions are often either impossible or impractical and the nature of the problems often makes the de®nition of analytical algorithms problematic. Like other engineering disciplines, software engineering is typically concerned with near optimal solutions or those which fall within a speci®ed acceptable tolerance. It is precisely these factors which make robust metaheuristic search-based optimisation techniques readily applicable. Metaheuristic algorithms, such as genetic algorithms (GA) [17], simulated annealing [37] and tabu search [16] have been applied successfully to a number of engineering GA research and researchers have even received interest from observers in the ®eld of social science. Though GA practitioners may not agree with the ®ndings of sociologists [18], it is an indication of the wide appreciation of the signi®cance of these search-based technologies that they should have penetrated the collective consciousness of even `non-technical' disciplines such as social science. However, the discipline of software engineering appears to be unique with regard to the application of genetic algo- rithms (and similar search-based, metaheuristic optimisa- tion techniques); metaheuristic algorithms have received comparatively little attention from software engineers in comparison with that which they have received from researchers and practitioners in the more established ®elds of engineering. Information and Software Technology 43 (2001) 833±839 www.elsevier.com/locate/infsof
  • 18. The first paper to use a meta-heuristic search technique was probably the work of Boyer, Elspas and Levitt on the SELECT system [16]. The paper is remarkable in many ways. Consider the following paragraph, quoted from the paper: “The limitation of the above algorithms to linear combinations is an unacceptable, and vexing, one. For example, they could not handle an inequality like X∗Y +10∗Z− W ≥ 5 among its constraints, unless one were prepared to assign to X a trial value, and then attempt a solution (assuming the other inequalities are linear). We therefore considered various alternatives that would not be subject to this limitation. The most promising of these alternatives appears to be a conjugate gradient algorithm (‘hill climbing’ program) that seeks to minimise a poten- tial function constructed from the inequalities.” [16] Here we can see, not only the first use of computational search (hill climbing) in software engineering, but also a hint at the idea (assignment of concrete values) that was subsequently to become Dynamic Symbolic Execution (DSE) [21]. Within this single paragraph we therefore may arguably find the origins of both DSE and SBST (and, by extension, SBSE too). The SELECT paper is also remarkable in its sober and prescient assessment of the relative merits of testing and verification. Shortly after its publication, these two closely related research communities entered into a protracted and unhelpful ‘feud’ that generated a great deal more heat than light [29], [31], [35], [60]. Fortunately, we have more recently witnessed an accommodation between the two communities [61], and greater degree of welcome collaboration at their intersection [59]. We really ought to ruefully reflect on the delay in this rapprochement given the ‘understanding’ already Fig. 1: Cumulative number of Search Based Software Testing papers. As can be seen, the overall trend continues to suggest a polynomial yearly rise in the number of papers, highlighting the breadth of interest and strong health of SBST. Unlike Boyer et al. [16], Miller and Spooner used concrete execution of the program rather than symbolic execution, making their approach more similar to the techniques that ultimately became SBST, while the work of Boyer et al. followed a closely-related (but different) evolutionary path, which ultimately led to DSE. Current research develops both these techniques, and also hybrids that combine the best features of both [9], [63], [71], [110]. It appears that SBST research lay dormant for at approxi- mately a decade until the work of Korel [68], which introduced a practical test data generation approach, the Alternating http://crestweb.cs.ucl.ac.uk/resources/sbse_repository/ 2/2 The number of publications in the year from 1976 t0 2012 Fig. 2: The changing ratio of SBSE papers that are SBST papers. Initially, SBST dominated SBSE. Over the years, this te th ‘r d te a y e ri th fo te c m p *Y. J. M. Harman andY. Zhang,“Achievements, open problems and challenges for search based software testing,” in Proc. 8th IEEE Int. Conf. Softw. Testing,VerificationValidation,Apr. 2015, pp. 1–12.
  • 19. How It Works? Simply • There are few general steps that are common among all the tools and algorithms: 1. Choose a search-based optimization technique. 2. Decide on an objective function (fitness function) that will be a benchmark for the optimization process. 3. Search for the best candidate among a list of candidates. 4. Based on the chosen optimization technique, update the list of candidates. 5. Iterate until no better candidates found.
  • 20. PSO As An Example • PSO = Particle Swarm Optimization • "Particles" fly in this hyperspace and try to find the global minima/maxima, their movement being governed by a simple mathematical equation. ( ) ( )1 1 2 , 3 , 1 1 t t i t t g t t t t t v c v c p x c p x x x v + + + ì = + - + -ï í = +ïî pi vt xt pg xt+1 particle’s itself particle’s personal best particle’s neighbours best , , 1 2 3 : velocity at time step : position at time step : best previous position, at time step : best previous best, at time step , , , : co neighbour' gnitive/social s t t i t g t v t x t p t p t c c c = = = = = confidence coefficients
  • 21. Your problem must be a non-deterministic problem! “Remember” • Even for the same input, can exhibit different behaviors on different runs, as opposed to a deterministic algorithm. • There should not be an exact algorithm to solve your problem. • Many algorithms using Search-based algorithms for deterministic problems nowadays just to catch the attention and use some fancy representations. • But ………. They are wrong algorithms
  • 22. 1- The problem representation “Two important components” 2- The Fitness Function
  • 23. Fitness Function: The Key Ingredient • In Software Engineering: the fitness function converts some requirement of the system into some measurable number. Requirements Functional Non-functional Like the coverage of some attribute of the system -Security -Usability -Safety -Efficiency -Energy Consumption -Scalability -Availability
  • 25. Test Case Generation And Minimization • 34 switches = 234 = 1.7 x 1010 possible inputs = 1.7 x 1010 tests? • Every possible combination of inputs? Which interactions cause faults? • How many test cases we need to test?
  • 27. Payment Server Smart Phone Web Server User Browser Business Database Master iPhone iPlanet Chrome SQL Visa Blackberry Apache Explorer Oracle Firefox Access
  • 28. Test Case Minimization Approach Red color tuples 25% of the t-tuples Green color tuples 50% of the t-tuples Blue color tuples 75% of the t-tuples Brown color tuples 100% of the t-tuples
  • 29. How To Use The Fitness Function? PSTG Strategy - How to use fitness function to choose best test case that cover most of the interactions
  • 30. Some applications of combinatorial interaction
  • 31. GUI Testing The use of event based modeling in GUI testing
  • 32. Configuration Generation nest to be added to the FTS es in the d-tuples list (Step g as n-tuples remain in the dure of the strategy. enerated test suite, the strat- tudy on a reliable artifact correctness of the strategy m. The generated test suite internal structure of the ar- ation, the test suite is filtered d detected faults. In making ency of the first stage (i.e., at of other strategies. licly as tools to be down- strategies are unavailable n the same environment is n. The proposed strategy is gies, namely Jenny, TConfig, experimental environment ws 7 operating system, 64- of RAM. The algorithms are ents ed by the size of the con- pared with those strategies and criteria to be converted into a weighted number. Each criteri- on has an effect on the final result, which decides the rank and monthly wage of the officer. The final number is the resulting point. The program is selected because it has a nontrivial code base and different configurations. Fig. 12 shows the main window of the program. The program regards different configurations as input factors. Each input factor has different levels. For example, the user can choose “No Degree,” “Primary,” “Secondary,” “Diploma,” “Bachelor,” “Master,” and “Doctorate” levels for the “Degree” factor. Table 3 summarizes the factors and levels for the program. To this end, the input configuration of the program can be rep- resented by one factor with seven levels, one factor with six levels, eight factors with two levels each, and two factors with three levels each. Thus, this input configuration can be notated in an MCA no- tation as MCA (N; d, 71 61 28 32 ). We need 96,768 test cases to test Table 3 Summary of the input factors and levels for the case study program. No. Factors Levels 1 Degree [No Degree, Primary, Secondary, Diploma, Bachelor, Master, Doctor] 2 Children [Non, 1, 2, 3, 4, More_than_4] 3 Read [Checked, unchecked] 4 Write [Checked, unchecked] 5 Speak [Checked, unchecked] 6 Understand [Checked, unchecked] 7 New graduate [Checked, unchecked] 8 Experience [Checked, unchecked] 9 English [Checked, unchecked] 10 Disability [Checked, unchecked] 11 Marital status [Single, Married, Widow] 12 Resident [Local, Outsider, Foreigner]
  • 34. Control Engineering Application of combinatorial test design • Application of Combinatorial Interaction PID ControllerTuning.
  • 36. Avocado tool + CIT plugin • The goal is to add the CIT plugin to the other plugin set Development team Avocado + CIT plugin Quality Engineering team CIT Test cases Problem specification
  • 37. CIT Context and Structure "Varianter" object (all combinations) "Varianter" object (resulting combinations) - CIT is hooked in the Avocado Job, after the action of the YAML_TO_MUX plugin, before the call to the Avocado Runner. - The hook is triggered by the command line option "--combinatorial N". - CIT plugin receives the Varianter object, serializes it, filters out the unneeded variants, creates a new Varianter object out of the remaining variants and returns the object to the Avocado Job.
  • 39. Avocado Multiplexer + CIT : Benefits • Consider the example in the left. There are three branches, machine(5 leaves), image(6 leaves) and architecture(29 leaves). • Basically, with the current Avocado approach, there will be (5x6x29= 870) test cases. • Using our CIT algorithm, we can reduce the number of test cases to 175 while keeping the test coverage on very similar level. • By testing the effectiveness of CIT, we found that the reduced test set is as effective as the exhaustive test set (i.e., 870 test cases)
  • 40. Input Parameter Analysis *D. Richard Kuhn , Raghu N. Kacker ,Yu Lei, Introduction to CombinatorialTesting, Chapman & Hall/CRC, 2013
  • 41. Other Applications • Input testing: A systematic sampling of the input parameters for the system-under-test instead of random selection. • Model-based testing: Generating test cases for many models, like class diagrams, state charts. • Test suite prioritization: Arrange the test cases in the most effective way.
  • 43. Preamble • In line with the customer demands for new functionalities, software line of codes (LOCs) increases tremendously in the last 10 years. • From ->kilobytes -> Megabytes -> Gigabytes->Terabytes …………
  • 44. Software Becomes More Complex • How to do maintenance? • Test engineers need to test more and more codes!!
  • 45. Clustering • Clustering = the process of organizing objects into groups whose members are similar in some way. • A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.
  • 47. Objectives • Maximizing cohesion • Describe functional strength of a module i.e. similar functions are grouped into a specific module. • Cohesion is a desirable design component attribute as when a change has to be made, it is localized in a single component. • Minimizing coupling • Describe interdependencies among modules • Loose coupling means component changes are unlikely to affect other components • Shared variables or control information exchange lead to tight coupling • Loose coupling can be achieved by state decentralization (as in objects) and component communication via parameters or message passing
  • 50. Modularization Quality And Modularization Factor
  • 51.
  • 53. Energy Optimization • Automatically transforming the color scheme of a mobile web application by rewrites the server side code and templates of a web application so that the resulting web application generates pages that are more energy efficient when displayed on a smartphone. Figure 1: the architecture of Nyx Figure 2: Example HARG for Program 1. To build the HARG, our approach parses the HOG to ag- gregate the individual characters in each node’s FSA into HTML tags. The traversal begins by traversing all of the edges in the FSA associated with the root node of the HOG and then following all of the outgoing edges of the root node and repeating this process until all nodes in the HOG have been traversed. During the traversal, the approach main- tains a parse state that allows it to determine if it is cur- rently parsing an HTML tag, attributes, or text. When the More broadly, the techniques for obtaining the HOG can lead to an over approximation of an application’s possible HTML pages. In turn, this can lead to the identification of spurious visual relationships that correspond to infeasi ble paths. This does not cause a problem for the approach as this merely introduces additional color relationship con straints that must be accounted for while generating the Color Transformation Scheme in Section 5.3. 5. COLOR TRANSFORMATION In the second phase, the approach calculates the new en ergy e cient color scheme for the application – the Color Transformation Scheme (CTS). There are two requirements for the CTS, it must: (1) use energy e cient colors as the basis for the new color scheme, and (2) maintain the color re lationships between neighboring HTML elements. The first requirement serves the general goal of the approach and the second ensures that the color-transformed pages are readable and, ideally, as visually appealing as the original pages. To address the first requirement, the CTS should replace large light colored background areas with dark colors (preferably black), as mentioned in Section 2. To address the second *Making Web Applications More Energy Efficient for OLED Smartphones , Ding Li, Angelica Huyen Tran, William G.J. Halfond, ICSE 2014, HTML Output Graph (HOG) HTMLAdjacency Relationship Graph (HARG) Color Conflict Graph (CCG) Color Transformation Scheme (CTS)
  • 54. Learning-Based Software Engineering • The use of machine learning technique to learn from some set of data or information retrieval system. A Learning-Based Software Engineering Environment Sidney C.Bailin, Robert H. Gattis, and Walt Truszkowski* Abstract We describe an initial prototype of a software engineering environment that combines case-based reasoning (CBR) and explanation-based learning (EBL) functions. CBR and EBL are used to evolve the environment's understanding of softwareprinciples as it is used. The case base serves as a repository for reusable solutions to software engineering problems. New solutions are synthesized from the case base through a process of adaptation,evaluation,and repair. Whena new solution is returned, the user has the option of rnodioing it through a series of primitive edit operations. The environment is capable of abstracting from these operations using explanation-based generalization, and synthesizing a new repair rule on the basis of the abstraction. We have successfully taught the environment a non-trivial design repair rule by means of a single example, and have observed the environment apply this learned rule to the solution of a new inputproblem. expertiseon engineeringsoftwaregrowsand developsover time. A knowledge-based computer system capable of offloading non-trivial engineering tasks from the human must also possess such an ability. If a KBSEE is ever to be able to move beyond "toy" problems, it must be able to learn. Our analysis of alternative machine learning approaches led us to propose a combination of two techniques:case-based reasoning (CBR) and explanation- based learning (EBL). Case-based reasoning is, in essence, an approach to reusing previously acquired knowledge. New situtations are recognized as having characteristics in common with situations previously encountered. Whatever has been learned through the previous encounter is then applied, with some adaptation if necessary, to the new situation. In the KBSEE, these situations(cases) are softwareengineeringproblems (for example, a set of requirements), and the reusable knowledge consists of solutionsto such problems (e.g., a design meetingtherequirements). Explanation-basedlearning is an approach to learning
  • 55. Two Main Applications • Classification learning • Learning from crowdsourcing • Adaptive feedback system learning SUTAlgo Learn Ver.
  • 56. Applied Soft Computing 62 (2018) 579–591 Contents lists available at ScienceDirect Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc Learning to classify software defects from crowds: A novel approach Jerónimo Hernández-Gonzáleza,∗ , Daniel Rodriguezc , I˜naki Inzaa , Rachel Harrisond , Jose A. Lozanoa,b a Department of Computer Science and Artificial Intelligence, University of the Basque Country UPV/EHU, Donostia, Spain b Basque Center for Applied Mathematics BCAM, Bilbao, Spain c Department of Computer Science, University of Alcala, Madrid, Spain d Department of Computing, Oxford Brookes University, Oxford, UK a r t i c l e i n f o Article history: Received 1 April 2017 Received in revised form 27 October 2017 Accepted 31 October 2017 Available online 7 November 2017 Keywords: Learning from crowds Orthogonal defect classification Missing ground truth Bayesian network classifiers a b s t r a c t In software engineering, associating each reported defect with a category allows, among many other things, for the appropriate allocation of resources. Although this classification task can be automated using standard machine learning techniques, the categorization of defects for model training requires expert knowledge, which is not always available. To circumvent this dependency, we propose to apply the learning from crowds paradigm, where training categories are obtained from multiple non-expert annotators (and so may be incomplete, noisy or erroneous) and, dealing with this subjective class infor- mation, classifiers are efficiently learnt. To illustrate our proposal, we present two real applications of the IBM’s orthogonal defect classification working on the issue tracking systems from two different real domains. Bayesian network classifiers learnt using two state-of-the-art methodologies from data labeled by a crowd of annotators are used to predict the category (impact) of reported software defects. The considered methodologies show enhanced performance regarding the straightforward solution (major- ity voting) according to different metrics. This shows the possibilities of using non-expert knowledge aggregation techniques when expert knowledge is unavailable. © 2017 Elsevier B.V. All rights reserved.
  • 57. Table 3 Examples of defects and the corresponding labelings provided by the annotators. Summary Description L1 L2 L3 L4 L5 Compendium dataset Error Launching Compendium LD after install Hi team, Error message launching Compendium LD after initial install: Java Virtual Machine Launcher Could not find the main class: com.compendium.ProjectCompendium. Program will exit. I have run through the suggestion on the forums of adding the path to javaw in the.bat, and verified the path through a command prompt is successful, same error. Any other tips? Regards, Eric Installability Other Installability Installability Installability Spell Checker Add a spelling checker to Compendium with the ability to switch on and off auto-spell checking. Requirements Requirements Requirements Requirements Requirements Can small icons also work with images? Make small images? Right now when you choose small icons, it shrinks the normal Compendium icons but not any reference node images, so they stay really big. Can we add an option to shrink those proportionally as well? Requirements Requirements Usability Requirements Other Text find/replace Global search/find/replace functionality. Maybe coupled with existing search parameters. Ability to change text in found nodes without having to open the nodes, edit the label/detail, etc. Requirements Usability Usability Requirements Requirements Mozilla dataset NSS autoconf does not include IRIX –enable-crypto does not work on IRIX as security/nss/confi-gure.in does not define XP UNIX and friends on IRIX. Requirements Reliability Maintenance Requirements Maintenance Mozilla automatically checks the “Reassign bug to” radio button Mozilla automatically checks the “Reassing bug to” radio button in Bugzilla causing unintentional changes to bugs. Tested with win32 051404 mozilla win32 build on NT. More to come. Other Other Reliability Reliability Reliability Installation failed with error -214 due to empty flash.xpi seen on mac commercial build 2001-05-09-04-trunk. The installers, both full and stub, failed with a -214 error. Though the installation “appears” complete, when launched, it crashes at the Installability Installability Installability Installability Installability roblem. A graphical description of the training process is n Fig. 3. The use of the training crowd-labeled data by the earning techniques is compared. se experiments were carried out using our own imple- n of the different learning algorithms and evaluation . As our developments are written in Java, we can make of several data management features currently imple- n the popular software Weka [52]. In this way, the text by users to describe defects (in two text fields, summary ption) was processed. In a pre-processing stage, standard niques have been used to extract a relevant set of vari- m the text fields and transform the original database into which can be handled by ML techniques. Specifically, the tringToWordVector filter implemented in Weka [52] was p-words were removed based on Rainbow [53], text was to lowercase; the iterated version of the Lovins stem- was applied as well as an alphabetic tokenizer where the relatively recent emergence of the learning paradigm, the model evaluation in this scenario is st explored. In this paper, the evaluation strategy foll on the same idea that confers its characteristic rob majority voting: the combination of multiple indep ments [9,40]. That is, the mean value of the perfor calculated using the labels of one annotator at a t truth is considered. In practice, all the experiments are evaluated as follows (see Fig. 4): (1) after mode performance of the model is estimated using the each labeler, one at a time, as ground truth, and (2) t of all the estimates is the final performance value. imental results are obtained with a 10 × 5-fold cr procedure [55].
  • 58. SB And LB Combined
  • 59. Learn from your testing output ! What is next ? SUT Search&Learn-based Testing Strategy System’s input Test output
  • 60. On The Challenges Of Search Algorithm • Changing the optimization algorithm will not necessary change the output to better. (My Experience) • It is the coverage criteria and the finest function that lead to better results. (My Experience)
  • 61. SBSE And LBSE: Challenges With The Industry • The non-deterministic output due to the randomness. • Expensive computation. • Modeling the system or the problem space. • Realize that you need one of them. Probably we need to change our way of communication with industry