Seminario-taller: Introducción a la Ingeniería del Software Guiada or Búsqueda

Seminario-taller
Introducción a la Ingeniería del
Software Guiada por Búsqueda
Francisco Chicano
Departamento de Lenguajes y Ciencias de la Computación

Seminario-taller: Introducción a la Ingeniería del Software Guiada por Búsqueda
Universidad de Almería, 26 y 27 de Octubre de 2020 (on-line) 2
chicano@lcc.uma.es
@francischicano
www.franciscochicano.es
José Francisco Chicano García

Planificación temporal
Hora Lunes 26 Martes 27
9:00-10:30 Introducción a SBSE y NRP Minimización de casos de prueba
10:30-10:45 Descanso Descanso
10:45-12:15 NRP (continuación) Refactorización
12:15-12:30 Descanso Descanso
12:30-14:00 Agrupamiento de módulos Planificación de proyectos y
prueba de conocimiento
Habrá una pequeña prueba de conocimiento el martes 27 en la última
franja

Materiales para seguir el taller
Software:
• RStudio (versión on-line en https://rstudio.cloud)
• Symphony (resolutor ILP open-source)
• Rsymphony (paquete de R para conectar con Symphony)
Código y ejemplos
• Disponibles en GitHub: https://github.com/jfrchicanog/TallerUAL2020
• Y en Rstudio.cloud: https://rstudio.cloud/project/1815713
Tarea: acceder a RStudio e
instalar Rsymphony

• Introducción a SBSE
• Requisitos para la Siguiente Versión (NRP)
• Programación Lineal Entera
• Optimización Multiobjetivo
• Agrupamiento de Módulos Software
• Minimización de Casos de Prueba
• Refactorización Automática de Software
• Planificación de Proyectos Software
• Conclusión
• Prueba de Conocimiento
Índice

Ingeniería del Software

Problemas de búsqueda
Un problema de búsqueda es una relación binaria R ⊆ X×Y, tal que dado un x ∈
X (instancia) estamos interesados en encontrar y ∈ Y (solución) con (x,y) ∈ R
Ejemplos de instancias de problemas de búsqueda:
- Encontrar los factores primos de 15
- Encontrar una cadena que case con la expresión regular a*b
- Encontrar un número real x que minimice la expresión (x-1)^2
Nos centraremos fundamentalmente en un subtipo de problemas de búsqueda:
los problemas de optimización

Un problema de optimización es un par: P = (S,f) donde:
S es un conjunto de soluciones (o espacio de búsqueda)
f: S → R es una función objetivo a minimizar o maximizar
Si nuestro objetivo es minimizar la función buscamos:
Máximo global
Máximo local
Mínimo global
Mínimo local
s’ Î S | f(s’) ≤ f(s), "s Î S
Problemas de optimización

Algoritmos de optimización
TÉCNICAS DE OPTIMIZACIÓN
EXACTAS APROXIMADAS
HEURÍSTICAS AD HOC METAHEURÍSTICAS
Gradiente
Mult. de Lagrange
Basadas en el cálculo
Programación dinámica
Ramificación y poda
Resolutor ILP
Exhaustivas
SA
VNS
TS
Trayectoria
EA
ACO
PSO
Población
Híbridos

Ingeniería del Software Guiada por Búsqueda
Máximo Global
Máximo Local
Mínimo Global
Mínimo Local
Problema de búsqueda
u optimización
Algoritmo de
búsqueda u
optimización
Solución
Término en inglés: Search-Based Software Engineering (SBSE)

Ingeniería del Software Guiada por Búsqueda
Término en inglés: Search-Based Software Engineering (SBSE)
Requisitos para la
siguiente versión
Agrupamiento de
módulos software
Minimización de
casos de prueba
Refactorización
automática
Planificación
de proyectos

Dados:
Ø Un conjunto de requisitos R = {r1, r2, ..., rn} …
Ø … cada uno con un coste cj y un valor sj (Bagnall et al.→ clientes)
Ø Un conjunto de interacciones funcionales entre requisitos
Ø Implicación (ri antes que rj):
Ø Combinación (ri a la vez que rj):
Ø Exclusión (no a la vez):
Encontrar un subconjunto de requisitos que además de cumplir con las
interacciones minimice el coste y maximice el valor:
del requisito rj para el cliente i se representa con vij 2 R. L
valor añadido por la inclusión de rj en la siguiente versión de
calcular como la suma ponderada de los valores de importa
sj =
Pm
i=1 wi ⇤vij. Los requisitos interaccionan entre ellos, im
de desarrollo determinado, lo que limita las alternativas par
Las interacciones funcionales entre requisitos se clasifican en
Implicación o precedencia. ri ) rj. Un requisito rj no p
previamente otro requisito ri no ha sido implementado.
Combinación o acoplamiento. ri rj. Los requisitos ri y rj
de forma conjunta en el software.
Exclusión. ri rj. El requisito ri no puede ser incluido j
Si llamamos X ✓ R al conjunto de requisitos seleccionado
de X vienen dados por las funciones:
coste(X) =
nX
cj y valor(X) =
nX
ar como la suma ponderada de los valores de imporPm
i=1 wi ⇤vij. Los requisitos interaccionan entre ellos,
sarrollo determinado, lo que limita las alternativas p
teracciones funcionales entre requisitos se clasifican
mplicación o precedencia. ri ) rj. Un requisito rj no
eviamente otro requisito ri no ha sido implementado
ombinación o acoplamiento. ri rj. Los requisitos ri y
forma conjunta en el software.
xclusión. ri rj. El requisito ri no puede ser incluido
llamamos X ✓ R al conjunto de requisitos selecciona
vienen dados por las funciones:
nX nX
calcular como la suma ponderada de los va
sj =
Pm
i=1 wi ⇤vij. Los requisitos interaccion
de desarrollo determinado, lo que limita las
Las interacciones funcionales entre requisito
Implicación o precedencia. ri ) rj. Un
previamente otro requisito ri no ha sido
Combinación o acoplamiento. ri rj. Los
Exclusión. ri rj. El requisito ri no pu
Si llamamos X ✓ R al conjunto de requis
coste(X) =
nX
j,rj 2X
cj y v
da requisito rj 2 R tiene un coste cj para la empresa si se
sj =
Pm
de desarrollo determinado, lo que limita las alternativas pa
Implicación o precedencia. ri ) rj. Un requisito rj no
Combinación o acoplamiento. ri rj. Los requisitos ri y r
Exclusión. ri rj. El requisito ri no puede ser incluido
Si llamamos X ✓ R al conjunto de requisitos seleccionad
coste(X) =
nX
j,rj 2X
cj y valor(X) =
nX
j,rj 2X
respectivamente. Consideraremos una versión multi-objetiv
minimice el coste y maximice el valor del conjunto de requi
min
max
Bagnall et al. van der Akker et al.
Next Release Problem (NRP)
sj  ri 8(i, j) 2 Q
rj  ri 8(i, j) 2 P
valor( ˆR) =
mX
i=1
wi
Y
(j,i)2Q
h
j 2 ˆR
i

Next Release Problem (NRP): ejemplo
Clientes (importancia)
Requisito Coste Cliente 1 (4) Cliente 2 (2) Cliente 3 (5)
r1 2 x x
r2 4 x
r3 3 x x
r4 5 x
coste({r1, r3})=
valor({r1, r3})=
coste({r1, r2, r3})=
valor({r1, r2, r3})=

Introducción a la programación lineal
Un problema en programación lineal tiene la forma
max
nX
j=1
cjxj
nX
j=1
a1jxj  b1
nX
j=1
a2jxj  b2
. . .
nX
j=1
amjxj  bm
xj 0 j = 1, 2, . . . , n
max
nX
cjxj
X
j=1
a2jxj  b2
. . .
nX
j=1
amjxj  bm
xj 0 j = 1, 2, . . . , n
max
nX
j=1
cjxj
sujeto a
nX
j=1
aijxj  bi i = 1, 2, . . . , m
xj 0 j = 1, 2, . . . , n
max c · x
sujeto a
Ax  b
x 0
j=1
sujeto a
nX
j=1
aijxj  bi i = 1, 2, . . . , m
xj 0 j = 1, 2, . . . , n
max c · x
sujeto a
Ax  b
x 0
1
Sujeto a: Sujeto a: Sujeto a:

Ejemplo:
Maximizar x1+x2
Sujeto a:
– x1 + 9x2 ≤ 36
9x1 +x2 ≤ 45
x1, x2 ≥ 0
0 1 2 3 4 5
0
1
2
3
4
5
x1
x2
Región factible
x1+x2=cte

Con Rsymphony
Maximizar x1+x2
Sujeto a:
– x1 + 9x2 ≤ 36
9x1 +x2 ≤ 45
x1, x2 ≥ 0
0 1 2 3 4 5
0
1
2
3
4
5
x1
x2
Región factible
Por defecto, las columnas
se rellenan primero
Tarea: resolver el
programa con RStudio

Programación lineal entera
Se añade la restricción de que las variables solo pueden tomar
valores enteros
Ejemplo:
Maximizar x1+x2
Sujeto a:
– x1 + 9x2 ≤ 36
9x1 +x2 ≤ 45
x1, x2 ≥ 0
x1, x2 enteros
0 1 2 3 4 5
0
1
2
3
4
5
x1
x2
Soluciones factibles

Con Rsymphony
Maximizar x1+x2
Sujeto a:
– x1 + 9x2 ≤ 36
9x1 +x2 ≤ 45
x1, x2 ≥ 0
x1, x2 enteros
Tarea: resolver el
programa con RStudio
0 1 2 3 4 5
0
1
2
3
4
5
x1
x2
Soluciones factibles
Programación lineal entera

Dados:
Ø Un conjunto de requisitos R = {r1, r2, ..., rn} …
Ø … cada uno con un coste cj y un valor sj (Bagnall et al.→ clientes)
Ø Un conjunto de interacciones funcionales entre requisitos
Ø Implicación (ri antes que rj):
Ø Combinación (ri a la vez que rj):
Ø Exclusión (no a la vez):
Encontrar un subconjunto de requisitos que además de cumplir con las
interacciones minimice el coste y maximice el valor:
sj =
Pm
de desarrollo determinado, lo que limita las alternativas par
Implicación o precedencia. ri ) rj. Un requisito rj no p
Combinación o acoplamiento. ri rj. Los requisitos ri y rj
Exclusión. ri rj. El requisito ri no puede ser incluido j
Si llamamos X ✓ R al conjunto de requisitos seleccionado
coste(X) =
nX
cj y valor(X) =
nX
ar como la suma ponderada de los valores de imporPm
i=1 wi ⇤vij. Los requisitos interaccionan entre ellos,
sarrollo determinado, lo que limita las alternativas p
teracciones funcionales entre requisitos se clasifican
mplicación o precedencia. ri ) rj. Un requisito rj no
eviamente otro requisito ri no ha sido implementado
ombinación o acoplamiento. ri rj. Los requisitos ri y
forma conjunta en el software.
xclusión. ri rj. El requisito ri no puede ser incluido
llamamos X ✓ R al conjunto de requisitos selecciona
vienen dados por las funciones:
nX nX
calcular como la suma ponderada de los va
sj =
Pm
i=1 wi ⇤vij. Los requisitos interaccion
de desarrollo determinado, lo que limita las
Las interacciones funcionales entre requisito
Implicación o precedencia. ri ) rj. Un
previamente otro requisito ri no ha sido
Combinación o acoplamiento. ri rj. Los
Exclusión. ri rj. El requisito ri no pu
Si llamamos X ✓ R al conjunto de requis
coste(X) =
nX
j,rj 2X
cj y v
da requisito rj 2 R tiene un coste cj para la empresa si se
sj =
Pm
de desarrollo determinado, lo que limita las alternativas pa
Implicación o precedencia. ri ) rj. Un requisito rj no
Combinación o acoplamiento. ri rj. Los requisitos ri y r
Exclusión. ri rj. El requisito ri no puede ser incluido
Si llamamos X ✓ R al conjunto de requisitos seleccionad
coste(X) =
nX
j,rj 2X
cj y valor(X) =
nX
j,rj 2X
respectivamente. Consideraremos una versión multi-objetiv
minimice el coste y maximice el valor del conjunto de requi
min
max
Bagnall et al. van der Akker et al.
Next Release Problem (NRP)
sj  ri 8(i, j) 2 Q
rj  ri 8(i, j) 2 P
valor( ˆR) =
mX
i=1
wi
Y
(j,i)2Q
h
j 2 ˆR
i

En nuestro caso resolveremos la version de Bagnall et al. mono-objetivo, con el coste
limitado por una fracción del coste total de implementación de todos los requisites
Definimos un conjunto de n variables ri para los requisitos y m variables si para los
clientes. Tomarán valores 0 y 1.
Si ri=1 el requisito i se implementa, si ri=0 no se implementa
Si si=1 el cliente i está satisfecho (todos sus requisitos se implementan)
El valor del cliente i para la empresa es wi
El coste de implementar el requisito i es ci
El presupuesto es B
Modelo ILP de NRP: Objetivomax c · x
o a
Ax  b
x 0
max
mX
i=1
wisi
Tarea: hallar la
expresión objetivo

El presupuesto es B
Modelo ILP de NRP: Objetivomax c · x
o a
Ax  b
x 0
max
mX
i=1
wisi
Tarea: hallar la
expresión objetivo

El presupuesto es B
Modelo ILP de NRP: restricción de coste
max c · x
sujeto a
Ax  b
x 0
max
mX
i=1
wisi
nX
i=1
ciri  B
sj  ri 8(i, j) 2 Q
Tarea: hallar la
restricción de coste

El presupuesto es B
Modelo ILP de NRP: dependencias
sujeto a
Ax  b
x 0
max
mX
i=1
wisi
nX
i=1
ciri  B
sj  ri 8(i, j) 2 Q
rj  ri 8(i, j) 2 P
1
Tarea: hallar las restricciones
de dependencias entre
requisitos (implicación)

El presupuesto es B
sujeto a
Ax  b
x 0
max
mX
i=1
wisi
nX
i=1
ciri  B
sj  ri 8(i, j) 2 Q
rj  ri 8(i, j) 2 P
1
requisitos (implicación)

El presupuesto es B
requisitos (combinación)
sj  ri 8(i, j) 2 Q
rj  ri 8(i, j) 2 P
rj = ri 8(i, j) 2 C
valor( ˆR) =
mX
i=1
wi
Y
(j,i)2Q
h
j 2 ˆR
i

max c · x
sujeto a
Ax  b
x 0
max
mX
i=1
wisi
nX
i=1
ciri  B
sj  ri 8(i, j) 2 Q
rj  ri 8(i, j) 2 P
El presupuesto es B
Modelo ILP de NRP: satisfacción de clientes
de satisfacción de clientes

En la implementación en R se han usado las primeras n variables del vector de
variables para los requisitos y las restantes m variables para los clientes
Funciones relevantes:
• readNrpInstance(file): lee un fichero de instancia y devuelve una lista con una
representación interna
• ilpModel(nrpInstance, budgetLimitFraction): toma una lista con una instancia y una
fracción (número real) y crea un modelo ILP para la instancia
Ejemplo:
Modelo ILP de NRP
Tarea: resolver algunas
instancias con R

• En un problema MO hay varios objetivos (funciones) que queremos optimizar
f1
f2 Soluciones eficientes
(no dominadas)
Soluciones débilmente
eficientes
Solución no
soportada
Optimización multiobjetivo
Solución
dominada

Si minimizamos ambos objetivos
f1
f2
Optimización multiobjetivo
f1
f2
Frente convexo
Frente cóncavo
Fácil de resolver con
sumas ponderadas
de objetivos
No se puede resolver
con sumas ponderadas
de objetivos

¿Cómo será el frente en NRP?
coste
valor
valor
coste

0
100
200
300
400
500
600
700
800
0 10 20 30 40 50 60
Valor
Coste
ACS
NSGAII
GRASP
Pareto
(a) dataset1
0
500
1000
1500
2000
0 100 200 300 400 500 600 700
Valor
Coste
ACS
NSGAII
GRASP
Pareto
(b) dataset2
Figura 1. Frente de Pareto y aproximaciones de los algoritmos metaheur´ısticos.
Hemos de indicar que estos tiempos se reﬁeren de nuevo a una m´aquina
diferente (Pentium 4 a 3,2 GHz) y el objetivo no era encontrar el frente completo,
Algunos ejemplos
C., Domínguez-Ríos, del Águila, del Sagrado, Alba, JISBD 2016

NRP Multiobjetivo
Tarea: hallar manualmente el frente
de Pareto para nuestro ejemplo
r1 2 x x
r2 4 x
r3 3 x x
r4 5 x

NRP Multiobjetivo
Tarea: calcula el frente usando R
r1 2 x x
r2 4 x
r3 3 x x
r4 5 x

Queremos encontrar una partición de un conjunto de módulos software de
manera que el software quede estructurado en subsistemas que permitan
una mejora en el desarrollo y mantenibilidad del mismo
Agrupamiento de módulos software

Cómo medir la calidad de la solución obtenida:
Intra-conectividad: mide la cohesión entre módulos pertenecientes
a un mismo subsistema.
Inter-conectividad: mide el acoplamiento existente entre módulos
que pertenecen a distintos subsistemas.
La calidad de modularización del sistema (Modularization Quality, MQ)
combina ambas.

Dado un grafo de dependencias de módulos G = (V, A) , definimos un peso
w para cada arista. Llamamos n al número de nodos (módulos) y m al
número de aristas (número de relaciones o dependencias).
Se define la calidad de modularización del sistema como
El valor i (intra-conectividad) es la suma de los pesos de las aristas cuyos
extremos están ambos dentro del subsistema. Mide la cohesión.
El valor j (inter-conectividad) representa la suma de los pesos de las aristas con
un extremo en el subsistema y el otro no. Mide el acoplamiento.

087631 ===== MFMFMFMFMF
2
1
21
1
2
15 =
×+
=MF
7
4
32
2
2
12 =
×+
=MF
7
6
13
3
2
14 =
×+
=MF
...928571.1
14
27
7
6
7
4
2
1
==++=MQ
Agrupamiento de módulos software: ejemplo
Tarea: hallar MQ

Agrupamiento de módulos software: preguntas
¿Cuánto vale MQ si todos los módulos
están en grupos diferentes?
¿Cuánto vale MQ si todos los módulos
están en el mismo grupo?
¿Qué valor máximo puede tomar MQ?

El número de particiones de un conjunto de n elementos es un número de Bell
1, 1, 2, 5, 15, 52, 203, 877, 4140, 21147, 115975, …
¡Esto crece muy rápido!
Los algoritmos enumerativos son inviables para muchos módulos
El problema es no lineal (se descarta programación lineal entera)
Algoritmos exactos: ramificación y poda
Algoritmos aproximados: heurísticas y metaheurísticas
Agrupamiento de módulos software: resolución

Análisis del modelo:
- Si n = 1, MQ* = 0
- Si n = 2, MQ* = 1
- Si todos los nodos están aislados, MQ = 0
- Si hay un único subsistema (y más de un nodo), MQ = 1
- Para k subsistemas y n-k subsistemas: MQ <= k
- Experimentalmente se observa que el valor MQ* suele ser bajo en comparación
con el número de módulos
- Para k fijo, si hay gran diferencia de cardinalidad entre el grupo más grande y el
más pequeño, se obtiene un valor de MQ más bajo.
( )2,1,3,1,2,1* =xFormato de una solución:
[ ]1,0ÎiMF
¿Por qué?

Valor obtenido por el mejor algoritmo heurístico de Praditwong et al
MQ
Enumerativo Algoritmo B&B
Soluciones
visitadas
Tiempo (s)
Soluciones
visitadas
Tiempo (s)
MDG 8 1,92857 4140 0,09 6 0,10
MDG 10 2,5 115975 0,14 11 0,13
MDG 15 2,812 1382958545 226,00 24 23,00
mtunis 2,314* 2,314* 121,00*

Test Suite Minimization
Given:
Ø A set of test cases T = {t1, t2, ..., tn}
Ø A set of software elements to be covered (e.g., use cases) E= {e1, e2, ..., ek}
Ø A coverage matrix
Find a subset of tests X Í T maximizing coverage and minimizing the testing cost
tests X ✓ T with minimum cost covering all the program elements. In formal
terms:
minimize cost(X) =
nX
i=1
ti2X
ci (2)
subject to:
8ej 2 E, 9ti 2 X such that element ej is covered by test ti, that is, mij = 1.
The multi-objective version of the TSMP does not impose the constraint of
full coverage, but it defines the coverage as the second objective to optimize,
leading to a bi-objective problem. In short, the bi-objective TSMP consists in
finding a subset of tests X ✓ T having minimum cost and maximum coverage.
Formally:
minimize cost(X) =
nX
i=1
ti2X
ci (3)
maximize cov(X) = |{ej 2 E|9ti 2 X with mij = 1}| (4)
e1 e2 e3 ... ek
t1 1 0 1 … 1
t2 0 0 1 … 0
… … … … … …
tn 1 1 0 … 0
M=
3 Test Suite Minimization Problem
When a piece of software is modified, the new software is tested using
previous test cases in order to check if new errors were introduced. This
is known as regression testing. One problem related to regression testing
Test Suite Minimization Problem (TSMP). This problem is equivalent t
Minimal Hitting Set Problem which is NP-hard [17]. Let T = {t1, t2, · · ·
be a set of tests for a program where the cost of running test ti is ci an
E = {e1, e2, · · · , em} be a set of elements of the program that we want to
with the tests. After running all the tests T we find that each test can
several program elements. This information is stored in a matrix M = [m
dimension n ⇥ m that is defined as:
mij =
(
1 if element ej is covered by test ti
0 otherwise
The single-objective version of this problem consists in finding a subs
tests X ✓ T with minimum cost covering all the program elements. In fo
terms:
minimize cost(X) =
nX
i=1
ti2X
ci
subject to:
8ej 2 E, 9ti 2 X such that element ej is covered by test ti, that is, mi
The multi-objective version of the TSMP does not impose the constra

Example
e
ough a small example how to model with PB con-
SMP according to the methodology above described.
E = {e1, e2, e3, e4} and M:
e1 e2 e3 e4
t1 1 0 1 0
t2 1 1 0 0
t3 0 0 1 0
t4 1 0 0 0
t5 1 0 0 1
t6 0 1 1 0
-obj TSMP we need to instantiate Eqs. (5), (6) and
 t1 + t2 + t4 + t5  4e1 (10)
 t2 + t6  4e2 (11)
 t1 + t3 + t6  4e3 (12)
 t5  4e4 (13)
Assume unitary cost for tests: ci=1
cost({t1, t5})=
cov({t1, t5})=
cost({t1, t2, t5})=
cov({t1, t2, t5})=

Modelling the TSM Problem using ILP
M=
mij =
(
0 otherwise
terms:
minimize cost(X) =
nX
i=1
ti2X
ci
subject to:
full coverage, but it deﬁnes the coverage as the second objective to opti
leading to a bi-objective problem. In short, the bi-objective TSMP consi
ﬁnding a subset of tests X ✓ T having minimum cost and maximum cove
Formally:
n
e1 e2 e3 ... ek
t1 1 0 1 … 1
t2 0 0 1 … 0
… … … … … …
tn 1 1 0 … 0
Let us use n Boolean variables ti and m Boolean variables ei:
- ti=1 iff test i is selected
- ei=1 iff element i is covered (it depends on ti)
ci is the cost of test ti
Task: constraints relating
covered elements and tests
The single-objective formulation of TSMP is a p
formulation. Then, we can translate the 2-obj T
and then infer the translation of the 1-obj TSM
Let us introduce n binary variables ti 2 {0,
ti = 1 then the corresponding test case is inclu
the test case is not included. We also introduc
one for each program element to cover. If ej = 1
is covered by one of the selected test cases a
covered by a selected test case.
The values of the ej variables are not indepe
variable ej must be 1 if and only if there exist
and ti = 1. The dependence between both sets
the following 2m PB constraints:
ej 
nX
i=1
mijti  n · ej
We can see that if the sum in the middle

M=
mij =
(
0 otherwise
terms:
minimize cost(X) =
nX
i=1
ti2X
ci
subject to:
Formally:
n
e1 e2 e3 ... ek
t1 1 0 1 … 1
t2 0 0 1 … 0
… … … … … …
tn 1 1 0 … 0
Task: constraints relating
covered elements and tests
The single-objective formulation of TSMP is a p
formulation. Then, we can translate the 2-obj T
and then infer the translation of the 1-obj TSM
Let us introduce n binary variables ti 2 {0,
ti = 1 then the corresponding test case is inclu
the test case is not included. We also introduc
one for each program element to cover. If ej = 1
is covered by one of the selected test cases a
covered by a selected test case.
The values of the ej variables are not indepe
variable ej must be 1 if and only if there exist
and ti = 1. The dependence between both sets
the following 2m PB constraints:
ej 
nX
i=1
mijti  n · ej
We can see that if the sum in the middle

M=
mij =
(
0 otherwise
terms:
minimize cost(X) =
nX
i=1
ti2X
ci
subject to:
Formally:
n
e1 e2 e3 ... ek
t1 1 0 1 … 1
t2 0 0 1 … 0
… … … … … …
tn 1 1 0 … 0
Task: expression for coverage
ej 
nX
i=1
mijti  n · ej 1  j
We can see that if the sum in the middle is zero
element ej) then the variable ej = 0. However, if the
ej = 1. Now we need to introduce a constraint related t
in order to transform the optimization problem in a
described in Section 2.2. These constraints are:
nX
i=1
citi  B,
mX
j=1
ej P,
where B 2 Z is the maximum allowed cost and P 2 {0, 1

M=
mij =
(
0 otherwise
terms:
minimize cost(X) =
nX
i=1
ti2X
ci
subject to:
Formally:
n
e1 e2 e3 ... ek
t1 1 0 1 … 1
t2 0 0 1 … 0
… … … … … …
tn 1 1 0 … 0
Task: expression for cost
riable ej must be 1 if and only if there exists a ti variable f
d ti = 1. The dependence between both sets of variables can
e following 2m PB constraints:
ej 
nX
i=1
mijti  n · ej 1  j  m.
We can see that if the sum in the middle is zero (no tes
ment ej) then the variable ej = 0. However, if the sum is
= 1. Now we need to introduce a constraint related to each o
order to transform the optimization problem in a decision
scribed in Section 2.2. These constraints are:
nX
i=1
citi  B,
mX
ej P,

Example
e
E = {e1, e2, e3, e4} and M:
e1 e2 e3 e4
t1 1 0 1 0
t2 1 1 0 0
t3 0 0 1 0
t4 1 0 0 0
t5 1 0 0 1
t6 0 1 1 0
 t1 + t2 + t4 + t5  4e1 (10)
 t2 + t6  4e2 (11)
 t1 + t3 + t6  4e3 (12)
 t5  4e4 (13)
t5 1 0 0 1
t6 0 1 1 0
If we want to solve the 2-obj TSMP we need to instantiate E
(7). The result is:
e1  t1 + t2 + t4 + t5  4e1
e2  t2 + t6  4e2
e3  t1 + t3 + t6  4e3
e4  t5  4e4
t1 + t2 + t3 + t4 + t5 + t6  B
e1 + e2 + e3 + e4 P
where P, B 2 N.
If we are otherwise interested in the 1-obj version the formula
t1 + t2 + t4 + t5 1
t2 + t6 1
t1 + t3 + t6 1
t5 1
t1 + t2 + t3 + t4 + t5 + t6  B
f(x)  B
e1  t1 + t2 + t4 + t5  6e1
e2  t2 + t6  6e2
e3  t1 + t3 + t6  6e3
e4  t5  6e4
Task: find equations for
this example
min
max

Algorithm for Solving the 2-obj TSM
Cost
Coverage
Max coverage
Find max coverage
Decrease cost and find
the maximum coverage
again
and again
min cost, keeping cov

Instances from the Software-artifact Infrastructure Repository (SIR)
TSM Instances
http://sir.unl.edu/portal/index.php
Instance Tests Elements to cover
printtokens1 4130 189
printtokens2 4115 199
replace 5542 242
schedule 2650 151
schedule2 2710 128
tcas 1608 65
totinfo 1052 124

En la implementación en R se han usado las primeras n variables del vector de
variables para los tests y las restantes m variables para los elementos a cubrir
Funciones relevantes:
• readTsmInstance(file, unitaryCost=FALSE): lee un fichero de instancia y devuelve
una lista con una representación interna
• ilpModel4Tsm(tsmInstance, costUpperBound=NULL, covLowerBound=NULL): toma
una instancia y una cota para coste o cobertura y crea un modelo ILP para la
instancia que optimiza el objetivo que no está acotado
• solveModel(model): resuelve el modelo ILP que se pasa como parámetro
Ejemplo:
Ejercicio
Tarea: resolver algunas
instancias con R

Complete la función computeParetoFront para calcular el frente complete de una
instancia
Ejemplo:
Ejercicio
Tarea: completar
computeParetoFront

Reduction in the Number of Test Cases
We can reduce the number of tests cases in the original test suite
If a test t1 covers more elements than another test t2 and has less cost, t2 can be
removed
e1 e2 e3 ... em
t1 1 0 0 … 1
t2 1 0 1 … 1
… … … … … …
tn 1 1 0 … 0
Test t1 can be
removed if c1 >= c2

Reduction in the Number of Test Cases
Instance Tests Reduced tests
printtokens1 4130
printtokens2 4115
replace 5542
schedule 2650
schedule2 2710
tcas 1608
totinfo 1052
Tarea: completar la tabla
Con la ayuda de reduceInstance complete la table.
¿Cuánto se tarda ahora en calcular el frente de Pareto? ¿Es igual?

Refactoring
Página 13 de 18http://0-proquestcombo.safaribooksonline.com.jabega.uma.es/print?xmlid=9780136083238%2Fch17lev1sec4
G29: Avoid Negative Conditionals
Negatives are just a bit harder to understand than positives. So, when possible, conditionals should be
expressed as positives. For example:
if((buffer.shouldCompact())
is preferable to
if((!buffer.shouldNotCompact())
G30: Functions Should Do One Thing
It is often tempting to create functions that have multiple sections that perform a series of operations.
Functions of this kind do more than one thing, and should be converted into many smaller functions, each of
which does one thing.
For example:
public(void(pay()({
((for((Employee(e(:(employees)({
((((if((e.isPayday())({
((((((Money(pay(=(e.calculatePay();
((((((e.deliverPay(pay);
((((}
((}
}
This bit of code does three things. It loops over all the employees, checks to
be paid, and then pays the employee. This code would be better written as:
public(void(pay()({
((for((Employee(e(:(employees)
((((payIfNecessary(e);
}
private(void(payIfNecessary(Employee(e)({
((if((e.isPayday())
((((calculateAndDeliverPay(e);
}
private(void(calculateAndDeliverPay(Employee(e)({
((Money(pay(=(e.calculatePay();
((e.deliverPay(pay);
}
Each of these functions does one thing. (See “Do One Thing” on page 35.)
G31: Hidden Temporal Couplings
Temporal couplings are often necessary, but you should not hide the couplin
Semantic-preserving change in the code

Anti-pattern
Common solution to a problem with bad consequences

Automatic Refactoring
Página 13 de 18http://0-proquestcombo.safaribooksonline.com.jabega.uma.es/print?xmlid=9780136083238%2Fch17lev1sec4
Boolean logic is hard enough to understand without having to see it in the context of an if or while statement.
Extract functions that explain the intent of the conditional.
For example:
if((shouldBeDeleted(timer))
is preferable to
if((timer.hasExpired()(&&(!timer.isRecurrent())
G29: Avoid Negative Conditionals
Negatives are just a bit harder to understand than positives. So, when possible, conditionals should be
expressed as positives. For example:
if((buffer.shouldCompact())
is preferable to
if((!buffer.shouldNotCompact())
G30: Functions Should Do One Thing
It is often tempting to create functions that have multiple sections that perform a series of operations.
Functions of this kind do more than one thing, and should be converted into many smaller functions, each of
which does one thing.
For example:
public(void(pay()({
((for((Employee(e(:(employees)({
((((if((e.isPayday())({
((((((Money(pay(=(e.calculatePay();
((((((e.deliverPay(pay);
((((}
((}
}
This bit of code does three things. It loops over all the employees, checks to s
be paid, and then pays the employee. This code would be better written as:
public(void(pay()({
((for((Employee(e(:(employees)
((((payIfNecessary(e);
}
private(void(payIfNecessary(Employee(e)({
((if((e.isPayday())
((((calculateAndDeliverPay(e);
}
private(void(calculateAndDeliverPay(Employee(e)({
((Money(pay(=(e.calculatePay();
((e.deliverPay(pay);
}
Each of these functions does one thing. (See “Do One Thing” on page 35.)
G31: Hidden Temporal Couplings
Temporal couplings are often necessary, but you should not hide the couplin

ential dependency conflicts and mutual exclusion
e more on these two kind of conflicts in the fol-
belongs to class B instead, if A is a subclass of B.
To better illustrate the refactoring scheduling problem, and the ef-
fect that the consideration of dependencies and conflicts between re-
factorings has on the size of the search-space, we present an example of
Listing 1. Example of classes to be refactored.
reduce even more the search-space by removing these permutations as
they lead to the same design (same solution). This occurs because they
affect different code segments (the method and target class is different
for r1 and r3) , i.e., they are unrelated.
In addition, when a conflict exists between refactorings, it is pos-
sible to reduce the size of the search space further. For example, con-
sider the sequential dependency conflict between r1, r2, that is r2 cannot
be applied before r1 (inlining class Rectangle invalidates any move
method refactoring from/to that class). Hence, by removing redundant
solutions, and invalid solutions (solutions with elements that are con-
flicted) we can reduce the search-space size of the motivating example
by half (sequences 1, 2, 3, 4, 5, 6, 8 and 11). Thus, the value obtained
after applying Eq. (2) should be used as an upper bound of the search-
space size, as long as we assume that applying a refactoring sequence
code-ana
and a h
lationship
the lifetim
ships. He
relationsh
identified
contains
and anti-
nipulate
this step
matically
apply ref
quality o
design m
Gueheneu
Antoniol
3.2. Step
In thi
available
instances
that part
3.3. Step
Table 1
List of refactorings candidates for the example from Listing 1.
ID Type Source class Me
r1 Move method Geometry cal
r2 Inline Class Rectangle All
r3 Introduce Parameter Object Geometry lon
Table 2
Enumeration of possible refactoring sequences for the set of refactoring op-
erations {r1, r2, r3}.
sequence elements sequence elements
1. None 9. r3, r1
2. r1 10. r3, r2
3. r2 11. r1, r2, r3
4. r3 12. r1, r3, r2
5. r1, r2 13. r2, r1, r3
6. r1, r3 14. r2, r3, r1
7. r2, r1 15. r3, r2, r1
8. r2, r3 16. r3, r1, r2
R. Morales et al.
code-analyses with typically 100% precision and recall for associations
and a high precision and recall for aggregations. Composition re-
lationships cannot be entirely identified statically because they involve
the lifetime of the instances of the classes involved in such relation-
ships. Hence, idiom-level models include association and aggregation
relationships and only the few composition relationships that can be
identified with high precision and recall statically. A design-level model
contains information about occurrences of design motifs, code smells,
Table 1
ID Type Source class Method Target Class
r1 Move method Geometry calcAreaRectangle Rectangle
r2 Inline Class Rectangle All fields and methods Shape
r3 Introduce Parameter Object Geometry longParameterListMethod GeometryParamObj (new)
Table 2
1. None 9. r3, r1
2. r1 10. r3, r2
R. Morales et al.
Example
i.e., the (1) detection of classes that contain anti-patterns; (2) the
generation of refactoring candidates to improve the design quality of
the classes detected in (1); (3) the search for an optimal refactoring
order; and (4) the application of the refactoring order from (3). To
achieve this goal, we propose a new heuristic approach called RePOR
(Refactoring approach based on Partial Order Reduction). Partial order
reduction is a popular technique for controlling state space explosion in
model checking (Lluch-Lafuente et al., 2002). The intuition is to reduce
the number of refactoring sequences to be explored by removing
equivalent sequences (i.e., refactoring sequences that leads to the same
design). As a result, less search effort is required than when using
metaheuristic algorithms. To evaluate RePOR, we conduct a series of
experiments over a testbed of five open source software systems (OSS)
and compare the results with Genetic Algorithm (GA) (Holland, 1975),
Ant Colony optimization (ACO) (Dorigo et al., 2006), the conflict-aware
refactoring scheduling approach proposed by Liu et al. (2008) (referred
to as LIU in this paper), and a new optimizer based on sampling (SWAY)
(Chen et al., 2018). We show that the solutions obtained by RePOR
overcome the ones obtained by the above-mentioned state-of-the-art
optimization techniques in terms of performance (i.e., execution time)
and effort (i.e., number of refactorings applied).
Tool and Data Replication. The Eclipse Plug-in and all the data
used in the experiments are available on the RePOR replication package
(Morales et al., 2017b).
The remainder of the paper is organized as follows: Section 2 dis-
cusses the formulation of the refactoring scheduling problem, and de-
scribes how to reduce the search-space size using partial order reduc-
tion. Section 3 describes RePOR in detail. Section 4 presents the case
study for evaluating our approach. Section 5 presents and discusses the
results obtained in our case study. Section 7 discloses the threats to the
validity of our study. Related work is discussed in Section 8. Finally, we
present our conclusions and lay out directions for future work in
Section 9.
2. Formulation of the refactoring scheduling problem
As a software system ages, its design quality deteriorates unless it is
continually maintained (Parnas, 1994). Refactoring is a software
maintenance activity that aims to keep the design quality of a software
system at an acceptable level, in order to ensure a normal evolution of
the system. Typically, refactoring is performed by applying small
transformation operations (e.g., moving a method/field to another
class) to a software system while preserving its original behavior. Since
there is a wide range of candidate refactorings that can be applied on a
system, depending on the domain of the system, an optimal solution
may be comprised of several refactorings that improve different quality
attributes. Hence, the refactoring scheduling problem consists of
finding the best combination of refactorings that maximizes the design
quality improvement of a software system. The problem of finding an
optimal order can be solved using search-based techniques. Search al-
gorithms start by generating one or more random sequences. Next, the
quality of each sequence is computed by applying it to the software
the number of occurrences of an
The outcome of Q(SR) is a nega
moves anti-patterns; zero if the
same, and positive otherwise. T
lated to the presence and the or
Hence, we suggest that refac
on the classes that they affect. I
parately. Since the order of app
ferent classes in a sequence is irr
refactoring operations that we n
that we have a set of refact
According to Morales et al. (2
quences (S) that we could gener
given by Eq. (2).
= ⎧
⎨⎩
⌊ ⌋ ∀ ≥
=
S
e n n
n
· ! 1
1 0
where e is the Euler constant,
available.
Applying Eq. (2) to our ex
(⌊ ⌋ =e·2! 5): < > , < A > , < B
if (iff) we assume that each per
(here the term solution refers to
sequence to a system, i.e., the re
and < B, A > are two different
and only 4 different solutions ex
In the case of refactorings th
design may vary depending on
factorings, as the application of
the rest of refactorings. We can
factorings as an undirected graph
ru, rv ∈ Rk. k ∈ K, where K is the
set of refactorings that affect cla
graph, is linked to the structure o
refactorings modify a class, an
factorings affect the number of
after refactoring.
We use GB to find the conne
component is a maximal subgra
connected by a path. Connecte
over the refactoring operations.
reduction from model checking (L
the removal of sequences of refa
Partial order reduction (POR) i
tativity of asynchronous systems
concurrent models impose an a
events, refactoring scheduling im
refactoring operations. The orde
instructions is meaningless (as th
Hence, we can consider just o
property since the other ordering
to construct a reduced state gra
Are all permutations relevant?

the
em.
re-
e to
-re-
the
ing,
the
y of
ring
To
∑= = ′ −
∈
Q SR Q sr Q sr AC k AC k( ) ( ); with ( ) ( ) ( )
k K
k k
In Eq. (1), SR is a subset of R; R is the set of refactorin
applied in a system SYS; K is the set of classes in SYS, K ∈ SYS
subset of SR that modifies class k (k ∈ K). Each sub-function
computed by subtracting the number of occurrences of anti-pa
class k after applying srk to k (i.e., AC(k′)) and the number o
rences of anti-patterns before refactoring (i.e., AC(k)). Note tha
the number of occurrences of anti-patterns as a proxy of design
The outcome of Q(SR) is a negative value when applying SR
moves anti-patterns; zero if the number of anti-patterns rem
same, and positive otherwise. The quality effect of applying
Objective Function
Class after refactoring
Class before refactoring
Anti-patterns count
me conclusions and future work.
UDO-BOOLEAN OPTIMIZATION
hod for identifying improving moves in the radius
g ball can be applied to all k-bounded pseudo-
ptimization problems. This makes our method
al: every compressible pseudo-Boolean Optimiza-
m can be transformed into a quadratic pseudo-
ptimization problem with k = 2.
ily of k-bounded pseudo-Boolean Optimization
ave also been described as an embedded landscape.
ed landscape [3] with bounded epistasis k is de-
function f(x) that can be written as the sum
nctions, each one depending at most on k input
That is:
f(x) =
mX
i=1
f(i)
(x), (1)
subfunctions f(i)
depend only on k components
dded Landscapes generalize NK-landscapes and
SAT problem. We will consider in this paper that
of subfunctions is linear in n, that is m 2 O(n).
dscapes m = n and is a common assumption in
T that m 2 O(n).
subfunctions f . Let us define w
such that the i-th element of wl is
on variable xi. The vector wl ca
that characterizes the variables t
has bounded epistasis k, the num
with |wl|, is at most k. By the
equalities immediately follow.
f(l)
(x v) = f(l)
(x) for all v
S(l)
v (x) =
⇢
0 if w
S
(l)
v^wl
(x) othe
Equation (5) claims that if n
change in the move characterize
f(l)
the Score of this subfunction
this subfunction will not change f
On the other hand, if f(l)
depend
we only need to consider for the
changed variables that a↵ect f(l)
acterized by the mask vector v ^
we can write (3) as:
Sv(x) =
mX
l=1
wl^v6=0
f = + + +f(1)(x) f(2)(x) f(3)(x) f(4)(x)
x1 x2 x3 x4
The structure is well-known in optimization…
x4 x3
x1 x2
Variable
Interaction Graph

Objective Function
x1
x2
x4
x3
x5
x6
If variable interaction graph has several connected componentes, we can
optimize each of them independently

Dependency Graph (GB)
r1
r2
r4
r3
r5
r6
Two refactoring operations are adjacent in GB when both touch the same class
We can optimize each connected component of GB independently, exploring all the
posible sequences in the component

Dependency Graph (GB): example
What is the dependency graph in our example?
kind of conflicts, sequential dependency conflicts and mutual exclusion
conflicts. We elaborate more on these two kind of conflicts in the fol-
lowing.
• Given two refactorings ri and rj, ri has a sequential dependency
conflict with rj iff rj cannot be applied before ri. We represent se-
quential dependency conflicts as follows: r1 → r2, which means that
r1 can be followed by r2, but r2 cannot be followed by r1. Note that
conflicts are directional, i.e., the fact that applying rj disables ri does
not necessarily means that ri disables rj.
• Given two refactorings ri and rj, ri has a mutual exclusion conflict
with rj iff ri and rj cannot be applied together in any order. We re-
present mutual exclusion with the following notation: ¬ ↔r r1 2.
the problem in Listing 1.
The refactorings presented in Table 1 can be applied to refactor the
classes described in Listing 1.
Table 1 contains three type of refactorings from Fowler (1999b) that
we describe below:
1. Move method. Move a method from one class to another (e.g., to one
of its parameter types (Seng et al., 2006)).
2. Inline Class. If a class contains few responsibilities, move all its
features to another class and remove it.
R. Morales et al.
and anti-patterns. A code meta-model should provide methods to ma-
nipulate the design model and generate other models. The objective of
this step is to manipulate the design model of a system program-
matically. Hence, the code meta-model is used to detect anti-patterns,
apply refactoring sequences and evaluate their impact on the design
quality of a system. More information related to code meta-models,
design motifs and micro-architecture identification can be found in
Gueheneuc and Albin-Amiot (2004) and Guéhéneuc and
Antoniol (2008).
3.2. Step 2: detect anti-patterns
In this step we detect anti-patterns in the meta-model using any
Table 1
Table 2
1. None 9. r3, r1
2. r1 10. r3, r2
3. r2 11. r1, r2, r3
4. r3 12. r1, r3, r2
5. r1, r2 13. r2, r1, r3
6. r1, r3 14. r2, r3, r1
7. r2, r1 15. r3, r2, r1
8. r2, r3 16. r3, r1, r2
R. Morales et al.
r1 r2
r3
Task: find the
dependency graph

Dependency Graph (GB): example
What is the dependency graph in our example?
lowing.
we describe below:
R. Morales et al.
Antoniol (2008).
Table 1
Table 2
1. None 9. r3, r1
2. r1 10. r3, r2
3. r2 11. r1, r2, r3
4. r3 12. r1, r3, r2
5. r1, r2 13. r2, r1, r3
6. r1, r3 14. r2, r3, r1
7. r2, r1 15. r3, r2, r1
8. r2, r3 16. r3, r1, r2
R. Morales et al.
r1 r2
r3
Task: find the
dependency graph

Conflict Graph (GC)
r1
r2
r4
r3
r5
r6
Conflict graph is used to reduce the number sequences to explore in each
component
Sequential dependency conflict
Mutual exclusion conflict

What is the conflict graph in our example?
lowing.
we describe below:
R. Morales et al.
Antoniol (2008).
Table 1
Table 2
1. None 9. r3, r1
2. r1 10. r3, r2
3. r2 11. r1, r2, r3
4. r3 12. r1, r3, r2
5. r1, r2 13. r2, r1, r3
6. r1, r3 14. r2, r3, r1
7. r2, r1 15. r3, r2, r1
8. r2, r3 16. r3, r1, r2
R. Morales et al.
r1 r2
r3
Task: find the
conflict graph
Conflict Graph (GC): example

Input : System to refactor (SYS), Maximum number of refactoring operations in a connected component subgraph (threshold)
Output: An optimal sequence of refactoring operations (S R)
1 Require Proc: extractBestPermutation, getFirstValidS equenceFromccap
2 Steps RePOR(SYS, threshold)
3 AM = code meta-model generation (SYS)
4 A = Detect Anti-patterns(AM)
5 R = Generate set of refactoring candidates(AM, A)
6 GB = Build Graph of dependencies between refactorings and anti-patterns(AM, R, A)
7 CCAP = Find connected components (GB)
8 GC = Build Graph of conﬂicts between refactorings (AM, LR)
9 S R = Schedule sequence of refactorings(CCAP, GC, AM)
10 Procedure Schedule sequence of refactorings(CCAP, GC, AM):
11 S R = 0
12 for each ccap ∈ CCAP do
13 ccap.RemoveInvalidRefactorings(S R)
14 if ccap.size == 0 then
15 continue
16 else
17 List permuts = enumeratePermutations(ccap)
18 if permuts ≤ threshold then
19 S R.addAll(extractBestPermutation(AM, GC, permuts))
20 else
21 S R.addAll(getFirstValidS equenceFromccap(AM, GC, ccap, R))
22 end if
23 end if
24 end for
25 return S R
26 end
Algorithm 1. RePOR.
RePOR

Experimental Setup
Subjects
Tools
• PADL to create a high level model of the software
• DECOR to detect and correct anti-patterns on the model
In Table 4 we describe the type of anti-patterns studied and
refactoring strategies used to remove them. Table 5 shows the num
of refactoring candidates that were automatically found in each sys
4.3. RePOR implementation
We instantiate RePOR as an eclipse plug-in and compared it
three refactoring approaches. Design improvement (DI) is meas
using Eq. (3). To determine the value of the parameter thres
Listing 2. Rule card of Blob anti-pattern from DECOR.
Table 3
Descriptive statistics about the studied systems.
System NOC KLOC BL LC LP SC SG Total
Apache Ant 1.8.2 697 191 57 40 35 3 6 141
ArgoUML 0.34 1754 183 131 25 281 1 19 457
GanttProject 1.10.2 188 44 47 4 68 5 6 130
JfreeChart 1.0.19 505 98 41 21 62 1 1 126
Xerces 2.7 540 71 56 25 119 2 3 205
Table 4
List of studied Anti-patterns and the refactorings used to correct them.
Type Description Refactoring(s) strategy
Blob (BL) (Brown et al., 1998) A large class that absorbs most of the functionality of the system with
very low cohesion between its constituents.
Move method (MM). Move the methods that does not seem to fit in
Blob class abstraction to more appropriate classes (Seng et al., 200
Lazy Class (LC) (Fowler, 1999a) Small classes with low complexity that do not justify their existence
in the system.
Inline class (IC). Move the attributes and methods of the LC to anot
class in the system.
Long Parameter List (LP)
(Fowler, 1999a)
A class with one or more methods having a long list of parameters,
specially when two or more methods are sharing a long list of
parameters that are semantically connected.
Introduce parameter object (IPO). Extract a new class with the long
of parameters and replace the method signature by a reference to
new object created. Then access to this parameters through the
parameter object.
Spaghetti Code (SC)
(Brown et al., 1998)
A class without structure that declares long methods without
parameters.
Replace method with method object (RMWO). Extract long methods i
new classes so all local variables become fields on that object.
Speculative Generality (SG) There is an abstract class created to anticipate further features, but it Collapse hierarchy (CH). Move the attributes and methods of the ch
described in Section 3.7, we executed 30 independent executions for
each of the systems studied in a Windows 10 64-bit, Intel Core 5 at 2.30
GHz, 12 GB of memory machine, and record the size of ccap, where the
performance of RePOR is acceptable, and found =threshold 10 to be the
best trade. The value of threshold indicates that for our experiments, we
only exhaustively explore the permutations of a ccap containing 10 or
less refactoring operations, and evaluate the resultant permutations
only after removing any conflicted refactoring operation.
The directed graph of conflicts (GC) is used for the three meta-
heuristics to avoid scheduling invalid refactorings. Due to the random
nature of the metaheuristics studied (i.e., ACO, GA, and SWAY) it is ne-
cessary to perform several independent runs to have an idea of the
behavior of the algorithms. Hence, we execute 30 independent runs for
all the approaches studied and for each system. This is a typical
minimum value (i.e., 30 runs) used in the search-based research com-
Table 5
Number of refactoring candidates automatically generated for each studied
system.
CH IC IPO MM RMWO Total
Ant
6 9 35 4269 3 4322
ArgoUML
19 25 281 2475 1 2800
Gantt Project
6 4 68 3861 5 3944
JfreeChart
1 21 62 4228 1 4313
Xerces
3 25 119 4118 2 4267
R. Morales et al.

Experimental Setup
Performance measures
• Design Improvement
• Execution time (ET): runtime of algorithms
• Refactoring Effort (RE): number of refactoring operations in the sequence
ment.1
For all statistical tests, we consider a significance level of
For RQ1, we measure the effectiveness of RePOR at removing a
patterns in software systems using the following dependent variable
• Design Improvement (DI). DI represents the delta of anti-patte
occurrences between the refactored system (SYS′) and the orig
system (SYS) and it is computed using the following formulatio
=
′ −
×DI SYS
AC SYS AC SYS
AC SYS
( )
( ) ( )
( )
100.
Where AC(SYS) is the number of anti-patterns in a system SYS
AC(SYS) ≥ 0. DI, which is a positive real number, represents
improvement amount in percentage, and high positive values
desired. Note that Eq. (3) assumes that ′ − <AC SYS AC SYS( ) ( ) 0
RePOR filters out solutions that make the design worse accordin
the desiredEffect threshold (cf., Algorithm 4).
The independent variable is the refactoring approach applied
each studied system. We statistically compare the number of
maining anti-patterns after refactoring a system using RePOR w
the number of remaining anti-patterns when using other refactor
approaches. Specifically, we test the following hypothesis H01: Th
is no difference between the number of remaining anti-patterns o
system refactored using RePOR, and a system refactored using o
refactoring approaches. We test the hypothesis using a non-p
metric test, i.e., the Mann–Whitney U test (Hollander et al., 201
For estimating the magnitude of the differences of means betw
Algorithms
• RePOR
• Conflict-aware scheduling of refactoring heuristic by Liu et al. (2008) (LIU)
• Ant Colony Optimization (ACO)
• Genetic Algorithm (GA)
• SWAY metaheuristic by Chen et al. (2018)

Results
RQ1: To what extent can RePOR remove anti-patterns?
We present in Table 7 the Design improvement (DI) in general and
the rest of the systems.
We reject the null hypothesis H01 for Ant, ArgoUML, Gantt,
JfreeChart, and Xerces. In these five systems, the number of re-
maining anti-patterns after refactoring using RePOR is significantly
lower than the number of anti-patterns remaining in the systems
after refactoring using the other refactoring approaches (i.e., ACO,
Table 7
Design Improvement (%) in general and for different anti-pattern types.
Metaheuristic DI DIBL DILC DILP DISC DISG
Ant
ACO 57.45 68.42 22.5 74.29 66.67 100
GA 58.16 68.42 22.5 74.29 66.67 100
LIU 58.87 54.39 22.5 100 66.67 100
RePOR 60.28 57.89 22.5 100 66.67 100
SWAY 45.36 57.89 20 60 66.67 83.33
ArgoUML
ACO 75.93 51.15 100 83.63 100 100
GA 76.59 51.15 100 84.7 100 100
LIU 81.40 50.38 100 92.88 100 100
RePOR 81.62 38.93 100 98.58 100 100
SWAY 62.91 48.09 84 66.01 100 86.84
Gantt Project
ACO 60 17.02 100 83.82 70 100
GA 60.77 14.89 100 85.29 80 100
LIU 63.85 14.89 100 92.65 60 100
RePOR 66.15 8.51 75 100 100 100
SWAY 50 8.51 100 70.59 60 100
JfreeChart
ACO 75.4 39.02 100 89.52 100 100
GA 75.4 39.02 100 90.32 100 100
LIU 72.22 31.71 100 88.71 100 100
RePOR 75.4 24.39 100 100 100 100
SWAY 61.90 36.59 90.48 73.39 100 100
Xerces
ACO 56.59 14.29 100 65.55 100 100
GA 57.56 14.29 100 67.23 100 100
LIU 64.39 16.07 100 78.99 50 100
RePOR 73.17 5.36 100 98.32 100 100
SWAY 41.87 14.29 68.00 49.58 50 100
Table 8
Pair-wise Mann–Whitney U Test for design improvement.
Pair −p value Cliff’s δ Magnitude
Ant
ACO-RePOR 2.561349e−12 1 Large
GA-RePOR 1.431438e−11 1 Large
LIU-RePOR 1.685298e−14 1 Large
SWAY-RePOR 1.190193e−12 1 Large
ArgoUML
Gantt Project
JfreeChart
ACO-RePOR 0.06868602 0.2333333 Small
GA-RePOR 0.2771456 −0.1333333 Negligible
Xerces
R. Morales et al.
the rest of the systems.
We reject the null hypothesis H01 for Ant, ArgoUML, Gantt,
ble 7
sign Improvement (%) in general and for different anti-pattern types.
Metaheuristic DI DIBL DILC DILP DISC DISG
nt
CO 57.45 68.42 22.5 74.29 66.67 100
A 58.16 68.42 22.5 74.29 66.67 100
IU 58.87 54.39 22.5 100 66.67 100
ePOR 60.28 57.89 22.5 100 66.67 100
WAY 45.36 57.89 20 60 66.67 83.33
rgoUML
CO 75.93 51.15 100 83.63 100 100
A 76.59 51.15 100 84.7 100 100
IU 81.40 50.38 100 92.88 100 100
ePOR 81.62 38.93 100 98.58 100 100
WAY 62.91 48.09 84 66.01 100 86.84
antt Project
CO 60 17.02 100 83.82 70 100
A 60.77 14.89 100 85.29 80 100
IU 63.85 14.89 100 92.65 60 100
ePOR 66.15 8.51 75 100 100 100
WAY 50 8.51 100 70.59 60 100
freeChart
CO 75.4 39.02 100 89.52 100 100
A 75.4 39.02 100 90.32 100 100
IU 72.22 31.71 100 88.71 100 100
ePOR 75.4 24.39 100 100 100 100
WAY 61.90 36.59 90.48 73.39 100 100
erces
CO 56.59 14.29 100 65.55 100 100
A 57.56 14.29 100 67.23 100 100
IU 64.39 16.07 100 78.99 50 100
ePOR 73.17 5.36 100 98.32 100 100
WAY 41.87 14.29 68.00 49.58 50 100
Table 8
Pair-wise Mann–Whitney U Test for design improvement.
Pair −p value Cliff’s δ Magnitude
Ant
ArgoUML
Gantt Project
JfreeChart
ACO-RePOR 0.06868602 0.2333333 Small
GA-RePOR 0.2771456 −0.1333333 Negligible
Xerces
Morales et al.

Seminario-taller: Introducción a la Ingeniería del Software Guiada or Búsqueda

Seminario-taller: Introducción a la Ingeniería del Software Guiada or Búsqueda

Recommended

Recommended

More Related Content

Similar to Seminario-taller: Introducción a la Ingeniería del Software Guiada or Búsqueda

Similar to Seminario-taller: Introducción a la Ingeniería del Software Guiada or Búsqueda (20)

More from jfrchicanog

More from jfrchicanog (20)

Recently uploaded

Recently uploaded (20)

Seminario-taller: Introducción a la Ingeniería del Software Guiada or Búsqueda