SlideShare a Scribd company logo
1 of 138
Download to read offline
CONCEPTS EXTRACTION FROM
EXECUTION TRACES
Montréal, November 17th 2014
Soumaya Medini
!
Ph.D. Defense
Outline
• Context
• Problem Statement
• Trace Segmentation
• Segments Merging
• Segments Labelling
• Segments Relations
• Usefulness Evaluation
• Conclusion and Future Work
2
Context
• Software maintenance effort is estimated to
be more than 70% of the overall software
cost. [Ian Sommerville, 2000]
• Program comprehension require half of the
effort devoted to software maintenance and
evolution. [Dehaghani et Hajrahimi, 2013]
3
Maintenance
Integration
Unit Testing
Coding
Design
Specifications
Requirements
Software Life-Cycle Costs
• Understand a program: identify which concept
this program implements.
• Concept location aims at identifying concepts
and locating them within code regions.
• A concept represents a functionality of a
program.
4
Context
Motivation
• A typical scenario in which concept location
takes part:
5
Motivation
• A typical scenario in which concept location
takes part:
5
Program Failure
Motivation
• A typical scenario in which concept location
takes part:
5
Program Failure
Motivation
• A typical scenario in which concept location
takes part:
Execution Trace
5
Program Failure
Motivation
• A typical scenario in which concept location
takes part:
Execution Trace
5
Program Failure
Analysing the one execution
trace to identify the sequence of
methods producing the failure.
Problem Statement
• Large and noisy:
• Execution trace corresponding to draw a
rectangle in JHotDraw contains 4,000
method calls.
6
Problem Statement
• Several approaches address these problems:
• Compacting execution traces
(encoding the whole execution as a directed acyclic graph)
[Reiss and Renieris, 2001]
• Building high-level behavioural models
(detecting and filtering utilities) [Hamou-Lhadj et al., 2005]
• Segmenting execution traces
(textual analysis or clustering algorithms) [Asadi et al., 2010] [Pirzadeh and
Hamou-Lhadj, 2011]
7
Problem Statement
• Several approaches address these problems:
• Compacting execution traces
(encoding the whole execution as a directed acyclic graph)
[Reiss and Renieris, 2001]
• Building high-level behavioural models
(detecting and filtering utilities) [Hamou-Lhadj et al., 2005]
• Segmenting execution traces
(textual analysis or clustering algorithms) [Asadi et al., 2010] [Pirzadeh and
Hamou-Lhadj, 2011]
7
None of these approaches guide
developers towards segments
that implements the concepts to
maintain.
Thesis
8
Identify concepts and facilitate the analysis
of large execution traces for maintenance
tasks.
9
Thesis
9
Trace
Segmentation
Thesis
9
?
?
?
?
Trace
Segmentation
Thesis
9
Trace
Segmentation
Segments
Labelling
Thesis
9
Trace
Segmentation
Segments
Labelling
Thesis
9
Trace
Segmentation
Segments
Labelling
Thesis
9
Trace
Segmentation
Segments
Labelling
Segments
Relations
Thesis
Outline
• Context
• Problem Statement
• Trace Segmentation
• Segments Merging
• Segments Labelling
• Segments Relations
• Usefulness Evaluation
• Conclusion and Future Work
10
Trace Segmentation
• Asadi et al. [2010]: identify concepts in
execution trace by finding cohesive and
decoupled fragments of the trace using
Genetic Algorithm (GA).
• Limitations:
• Not scalable (7 hours).
• Stability problems (different segmentation).
11
Background
• Steps:
1. System instrumentation and trace collection;
2. Pruning and compressing traces;
3. Textual analysis of method source code;
4. Trace splitting using optimization
techniques.
12
Step1: Program instrumentation and
trace collection
• We collect and tag traces.
13
Step2: Pruning and compressing
traces
• Pruning: Remove too frequent method
invocations.
• Compressing: Remove repetitions.
14
Step3: Textual analysis of Method
source code
• Extract identifiers from source code and
comments.
• Split identifiers using Camel-Case
(getBook get and book).
• Perform stemming (waited,waiting,waits wait).
• Remove programming language keywords and
english stop words.
• Index terms and documents using the TF-IDF
indexing mechanisms and apply LSI.
15
Step4: Trace Splitting through
optimization techniques
• Execution trace segmentation solution must be
found in large search spaces.
• We must apply some optimization techniques to
segment the trace.
• Approach built upon a dynamic programming
algorithm to:
• Improve scalability;
• Compute the exact splitting.
16
Dynamic Programming (DP)
Approach
• Solve a problem by dividing the problem into
sub-problems that are recursively solved.
• The solution of the problem: combining the
solutions of the sub-problems.
• The quality of the segmentation of a trace into K
segments:
17
• Cohesion
Cohesion and Coupling
18
• Cohesion
Cohesion and Coupling
18
• Cohesion
Cohesion and Coupling
18
• Cohesion
• Coupling
Cohesion and Coupling
18
• Example of trace segmentation using DP.
19
Dynamic Programming (DP)
Approach
• Example of trace segmentation using DP.
• Create a new segment.
19
Dynamic Programming (DP)
Approach
• Example of trace segmentation using DP.
• Create a new segment.
• Add the method to the last segment.
19
Dynamic Programming (DP)
Approach
Case Study Design
• Research Questions:
• RQ1: How do the performances of the GA
and DP approaches compare?
• RQ2: How do the GA and DP approaches
perform?
• Programs:
20
Case Study Results
• RQ1: How do the performances of the GA and
DP approaches compare?
21
Programs Scenarios
Number of
Segments
Fitness Time (s)
GA DP GA DP GA DP
ArgoUML
Create Note 24 13 0.54 0.58 7,080 2.13
Create Class, Create Note 73 19 0.52 0.60 10,800 4.33
JHotDraw
Draw Rectangle 17 21 0.39 0.67 2,040 0.13
Add Text, Draw Rectangle 21 21 0.38 0.69 1,260 0.64
Draw Rectangle, Cut Rectangle 56 20 0.46 0.72 1,200 0.86
Spawn Window, Draw Circle 63 26 0.34 0.69 240 1
Case Study Results
• RQ1: How do the performances of the GA and
DP approaches compare?
21
Programs Scenarios
Number of
Segments
Fitness Time (s)
GA DP GA DP GA DP
ArgoUML
Create Note 24 13 0.54 0.58 7,080 2.13
Create Class, Create Note 73 19 0.52 0.60 10,800 4.33
JHotDraw
Draw Rectangle 17 21 0.39 0.67 2,040 0.13
Add Text, Draw Rectangle 21 21 0.38 0.69 1,260 0.64
Draw Rectangle, Cut Rectangle 56 20 0.46 0.72 1,200 0.86
Spawn Window, Draw Circle 63 26 0.34 0.69 240 1
Case Study Results
• RQ1: How do the performances of the GA and
DP approaches compare?
21
Programs Scenarios
Number of
Segments
Fitness Time (s)
GA DP GA DP GA DP
ArgoUML
Create Note 24 13 0.54 0.58 7,080 2.13
Create Class, Create Note 73 19 0.52 0.60 10,800 4.33
JHotDraw
Draw Rectangle 17 21 0.39 0.67 2,040 0.13
Add Text, Draw Rectangle 21 21 0.38 0.69 1,260 0.64
Draw Rectangle, Cut Rectangle 56 20 0.46 0.72 1,200 0.86
Spawn Window, Draw Circle 63 26 0.34 0.69 240 1
Case Study Results
• RQ1: How do the performances of the GA and
DP approaches compare?
21
Programs Scenarios
Number of
Segments
Fitness Time (s)
GA DP GA DP GA DP
ArgoUML
Create Note 24 13 0.54 0.58 7,080 2.13
Create Class, Create Note 73 19 0.52 0.60 10,800 4.33
JHotDraw
Draw Rectangle 17 21 0.39 0.67 2,040 0.13
Add Text, Draw Rectangle 21 21 0.38 0.69 1,260 0.64
Draw Rectangle, Cut Rectangle 56 20 0.46 0.72 1,200 0.86
Spawn Window, Draw Circle 63 26 0.34 0.69 240 1
Programs Scenarios
Number of
Segments
Fitness Time (s)
GA DP GA DP GA DP
ArgoUML
Create Note 24 13 0.54 0.58 7,080 2.13
Create Class, Create Note 73 19 0.52 0.60 10,800 4.33
JHotDraw
Draw Rectangle 17 21 0.39 0.67 2,040 0.13
Add Text, Draw Rectangle 21 21 0.38 0.69 1,260 0.64
Draw Rectangle, Cut Rectangle 56 20 0.46 0.72 1,200 0.86
Spawn Window, Draw Circle 63 26 0.34 0.69 240 1
Case Study Results
• RQ1: How do the performances of the GA and
DP approaches compare?
• Wilxocon test and Cliff’s delta effect size:
Difference of the number of segments;
Values of fitness function;
Computation times.
22
Case Study Results
• RQ1: How do the performances of the GA and
DP approaches compare?
• Wilxocon test and Cliff’s delta effect size:
Difference of the number of segments;
Values of fitness function;
Computation times.
22
Case Study Results
• RQ1: How do the performances of the GA and
DP approaches compare?
• Wilxocon test and Cliff’s delta effect size:
Difference of the number of segments;
Values of fitness function;
Computation times.
22
Case Study Results
• RQ1: How do the performances of the GA and
DP approaches compare?
• Wilxocon test and Cliff’s delta effect size:
Difference of the number of segments;
Values of fitness function;
Computation times.
22
Case Study Results
• RQ2: How do the GA and DP approaches
perform?
23
Program Scenario Concept
Jaccard Precision
GA DP GA DP
ArgoUML
Create Note Create Note 0.33 0.87 1 0.99
Create Class, Create Note Create Class 0.26 0.53 1 1
Create Class, Create Note Create Note 0.34 0.56 1 1
JHotDraw
Draw Rectangle Draw Rectangle 0.9 0.75 0.9 1
Add Text, Draw Rectangle Add Text 0.31 0.33 0.36 0.39
Add Text, Draw Rectangle Draw Rectangle 0.62 0.52 0.62 1
Draw Rectangle, Cut Rectangle Draw Rectangle 0.74 0.24 0.79 0.24
Draw Rectangle, Cut Rectangle Cut Rectangle 0.22 0.31 1 1
Spawn Window, Draw Circle Draw Circle 0.82 0.82 0.82 1
Spawn Window, Draw Circle Spawn Window 0.42 0.44 1 1
Case Study Results
• RQ2: How do the GA and DP approaches
perform?
23
Program Scenario Concept
Jaccard Precision
GA DP GA DP
ArgoUML
Create Note Create Note 0.33 0.87 1 0.99
Create Class, Create Note Create Class 0.26 0.53 1 1
Create Class, Create Note Create Note 0.34 0.56 1 1
JHotDraw
Draw Rectangle Draw Rectangle 0.9 0.75 0.9 1
Add Text, Draw Rectangle Add Text 0.31 0.33 0.36 0.39
Add Text, Draw Rectangle Draw Rectangle 0.62 0.52 0.62 1
Draw Rectangle, Cut Rectangle Draw Rectangle 0.74 0.24 0.79 0.24
Draw Rectangle, Cut Rectangle Cut Rectangle 0.22 0.31 1 1
Spawn Window, Draw Circle Draw Circle 0.82 0.82 0.82 1
Spawn Window, Draw Circle Spawn Window 0.42 0.44 1 1
Case Study Results
• RQ2: How do the GA and DP approaches
perform?
• Wilxocon test and Cliff’s delta effect size:
Jaccad scores
Precision
24
Case Study Results
• RQ2: How do the GA and DP approaches
perform?
• Wilxocon test and Cliff’s delta effect size:
Jaccad scores
Precision
24
Case Study Results
• RQ2: How do the GA and DP approaches
perform?
• Wilxocon test and Cliff’s delta effect size:
Jaccad scores
Precision
24
Outline
• Context
• Problem Statement
• Trace Segmentation
• Segments Merging
• Segments Labelling
• Segments Relations
• Usefulness Evaluation
• Conclusion and Future Work
25
Segments Merging
• Multi-threading: induces variability in traces
collected for a given scenario.
26
Segments Merging
• Multi-threading: induces variability in traces
collected for a given scenario.
• Scenario draw rectangle:
26
First 	

Execution
Second	

Execution
Third 	

Execution
Original size
Compressed size
Number of segments
Segments Merging
• Multi-threading: induces variability in traces
collected for a given scenario.
• Scenario draw rectangle:
26
First 	

Execution
Second	

Execution
Third 	

Execution
4850 Methods
555 Methods
35 Segments
Original size
Compressed size
Number of segments
Segments Merging
• Multi-threading: induces variability in traces
collected for a given scenario.
• Scenario draw rectangle:
26
First 	

Execution
Second	

Execution
Third 	

Execution
4850 Methods
555 Methods
35 Segments
6668 Methods
240 Methods
21 Segments
Original size
Compressed size
Number of segments
Segments Merging
• Multi-threading: induces variability in traces
collected for a given scenario.
• Scenario draw rectangle:
26
First 	

Execution
Second	

Execution
Third 	

Execution
4850 Methods
555 Methods
35 Segments
6668 Methods
240 Methods
21 Segments
16 706
Methods
930 Methods
54 Segments
Original size
Compressed size
Number of segments
Segments Merging
• We merge segments obtained in multiple
executions of the same scenario.
• Similarity:
27
Segments Merging
• We merge segments obtained in multiple
executions of the same scenario.
• Similarity:
27
Threshold
Similarity Threshold
28
• Projects:
30 40 50 60 70 80 90
10305070
Similarity threshold
Numberofdifferentterms
False positives
False negatives
Segments Merging
• Example:
29
Segments Merging
• Example:
Threshold= 70%
29
Segments Merging
• Example:
Threshold= 70%
S1 S2 S3 S4
Z1 0.9 0.5 0.2 0.3
Z2 0.3 0.9 0.5 0.2
Z3 0.3 0.2 0.5 0.8
29
Segments Merging
• Example:
Threshold= 70%
S1 S2 S3 S4
Z1 0.9 0.5 0.2 0.3
Z2 0.3 0.9 0.5 0.2
Z3 0.3 0.2 0.5 0.8
S1 S2 S3 S4
Z1 0.9 0.5 0.2 0.3
Z2 0.3 0.9 0.5 0.2
Z3 0.3 0.2 0.5 0.8
29
Segments Merging
• Example:
Threshold= 70%
S1 S2 S3 S4
Z1 0.9 0.5 0.2 0.3
Z2 0.3 0.9 0.5 0.2
Z3 0.3 0.2 0.5 0.8
S1 S2 S3 S4
Z1 0.9 0.5 0.2 0.3
Z2 0.3 0.9 0.5 0.2
Z3 0.3 0.2 0.5 0.8
Synthetic Trace
29
Outline
• Context
• Problem Statement
• Trace Segmentation
• Segments Merging
• Segments Labelling
• Segments Relations
• Usefulness Evaluation
• Conclusion and Future Work
30
Segments Labelling
• Issue: choice of the most appropriate source of
information.
31
Segments Labelling
• Issue: choice of the most appropriate source of
information.
• Method bodies:
31
Segments Labelling
• Issue: choice of the most appropriate source of
information.
• Method bodies:
Identifiers;
31
Segments Labelling
• Issue: choice of the most appropriate source of
information.
• Method bodies:
Identifiers;
Comments;
31
Segments Labelling
• Issue: choice of the most appropriate source of
information.
• Method bodies:
Identifiers;
Comments;
• Method signature.
31
Segments Labelling
• Issue: choice of the most appropriate source of
information.
• Method bodies:
Identifiers;
Comments;
• Method signature.
Method signatures provide more
meaningful terms when labeling
software artifacts than other sources.
[De Lucia, 2012]
31
Segments Labelling
• Source of information: terms contained in the
signature of methods.
• Hypothesis: a term appearing often in a
particular segment, but not in other segments,
provides important information for that
segment.
• Ranks the terms of the segment by TF-IDF and
keeps the topmost ones.
32
Segments Labelling
• To reduce the time and effort: segments are
characterized using some unique methods
(TF-IDF).
• Small version (5): result in loss of relevant
information.
• Medium version(15): preserve better the
relevant information.
33
• Research Questions:
• RQ1: How do the labels produced by the
participants change when providing them
different amount of information?
Experiment Design
34
• Research Questions:
• RQ1: How do the labels produced by the
participants change when providing them
different amount of information?
• RQ2: How do the labels produced by the
participants compare to the generated labels?
Experiment Design
34
• Research Questions:
• RQ1: How do the labels produced by the
participants change when providing them
different amount of information?
• RQ2: How do the labels produced by the
participants compare to the generated labels?
• Projects:
Experiment Design
34
• Research Questions:
• RQ1: How do the labels produced by the
participants change when providing them
different amount of information?
• RQ2: How do the labels produced by the
participants compare to the generated labels?
• Projects:
• Participants:
Experiment Design
0 10 21 31
Student Professional
34
• RQ1: How do the labels produced by the
participants change when providing them different
amount of information?
Experiment Design
35
• RQ1: How do the labels produced by the
participants change when providing them different
amount of information?
• We group participants into 3 groups. Each
version is assigned to a different group.
Experiment Design
35
• RQ1: How do the labels produced by the
participants change when providing them different
amount of information?
• We group participants into 3 groups. Each
version is assigned to a different group.
Experiment Design
35
Terms of small
segment
• RQ1: How do the labels produced by the
participants change when providing them different
amount of information?
• We group participants into 3 groups. Each
version is assigned to a different group.
Experiment Design
35
Terms of small
segment
Terms of medium
segment
• RQ1: How do the labels produced by the
participants change when providing them different
amount of information?
• We group participants into 3 groups. Each
version is assigned to a different group.
Experiment Design
35
Terms of small
segment
Terms of medium
segment
Terms of full
segment
• RQ1: How do the labels produced by the
participants change when providing them different
amount of information?
• We group participants into 3 groups. Each
version is assigned to a different group.
Experiment Design
35
Terms of small
segment
Terms of medium
segment
Terms of full
segment
• RQ1: How do the labels produced by the
participants change when providing them different
amount of information?
Experiment Results
36
• RQ1: How do the labels produced by the
participants change when providing them different
amount of information?
Experiment Results
36
Medium Version
0%
25%
50%
75%
100%
2 Participants 5 Participants
5856 50
68
Small Version
0%
25%
50%
75%
100%
2 Participants 5 Participants
4644 44
52
Precision Recall
• RQ1: How do the labels produced by the
participants change when providing them different
amount of information?
Experiment Results
36
Medium Version
0%
25%
50%
75%
100%
2 Participants 5 Participants
5856 50
68
Small Version
0%
25%
50%
75%
100%
2 Participants 5 Participants
4644 44
52
Precision Recall
92%
• RQ1: How do the labels produced by the
participants change when providing them different
amount of information?
Experiment Results
36
Medium Version
0%
25%
50%
75%
100%
2 Participants 5 Participants
5856 50
68
Small Version
0%
25%
50%
75%
100%
2 Participants 5 Participants
4644 44
52
Precision Recall
92% 76%
• RQ1: How do the labels produced by the
participants change when providing them different
amount of information?
• Two-way permutation test:
Number of participants;
Size of the segment (full, medium, small);
Their interaction;
Years of programming experience.
Experiment Results
37
• RQ1: How do the labels produced by the
participants change when providing them different
amount of information?
• Two-way permutation test:
Number of participants;
Size of the segment (full, medium, small);
Their interaction;
Years of programming experience.
Experiment Results
37
• RQ1: How do the labels produced by the
participants change when providing them different
amount of information?
• Two-way permutation test:
Number of participants;
Size of the segment (full, medium, small);
Their interaction;
Years of programming experience.
Experiment Results
37
• RQ1: How do the labels produced by the
participants change when providing them different
amount of information?
• Two-way permutation test:
Number of participants;
Size of the segment (full, medium, small);
Their interaction;
Years of programming experience.
Experiment Results
37
• RQ1: How do the labels produced by the
participants change when providing them different
amount of information?
• Two-way permutation test:
Number of participants;
Size of the segment (full, medium, small);
Their interaction;
Years of programming experience.
Experiment Results
37
• RQ2: How do the labels produced by the
participants compare to the generated labels?
Experiment Design
38
• RQ2: How do the labels produced by the
participants compare to the generated labels?
• Oracle: 210 segments (less than 100)
manually labelled by the participants.
Experiment Design
38
• RQ2: How do the labels produced by the
participants compare to the generated labels?
• Oracle: 210 segments (less than 100)
manually labelled by the participants.
• Evaluation: 1 participant and 2 participants.
Experiment Design
38
• RQ2: How do the labels produced by the
participants compare to the generated labels?
Experiment Results
39
• RQ2: How do the labels produced by the
participants compare to the generated labels?
Experiment Results
39
0%
25%
50%
75%
100%
1 Participant 2 Participants Intersection 2 Participants Union
56
85
71
60
38
58
Precision Recall
Outline
• Context
• Problem Statement
• Trace Segmentation
• Segments Merging
• Segments Labelling
• Segments Relations
• Usefulness Evaluation
• Conclusion and Future Work
40
Segments Relations
• Formal Concept Analysis: used to identify
relations between concepts identified in
different segments.
• Groups objects that have common attributes:
objects are segments and attributes are terms.
• An FCA concept: maximal collection of objects
that have common attributes.
41
• FCA lattice for the execution trace of the
scenario create a class.
42
Segments Relations
• FCA lattice for the execution trace of the
scenario create a class.
42
Segments Relations
• FCA lattice for the execution trace of the
scenario create a class.
42
Segments Relations
• FCA lattice for the execution trace of the
scenario create a class.
42
Segments Relations
• FCA lattice for the trace of the scenario draw
rectangle delete rectangle (32 segments).
43
Segments Relations
• FCA lattice for the trace of the scenario draw
rectangle delete rectangle (32 segments).
43
Segments Relations
Difficult and demanding
task of identifying
relations
• Research Questions:
• RQ1: To what extent does our approach correctly
identify relations among segments?
Experiment Design
44
• Research Questions:
• RQ1: To what extent does our approach correctly
identify relations among segments?
• Projects:
Experiment Design
44
• Research Questions:
• RQ1: To what extent does our approach correctly
identify relations among segments?
• Projects:
• Participants:
Experiment Design
0 10 21 31
Student Professional
44
• RQ1: To what extent does our approach correctly
identify relations among segments?
Experiment Design
45
• RQ1: To what extent does our approach correctly
identify relations among segments?
• 100 relations are validated by participants.
Experiment Design
45
• RQ1: To what extent does our approach correctly
identify relations among segments?
• 100 relations are validated by participants.
Experiment Design
45
Segments Labels Relations
Our Approach
9 listener, add, change, figure
sub/super
concept
10 figure, listener, add, internal, multicaster, event, change
Participant 1
9 composite, figure, trigger, event
sub/super
concept
10 manage, figure, change, event, trigger
Participant 2
9 abstract, figure, change, add, listener
same concept
10 figure, change, event, multicaster, add, listener
• RQ1: To what extent does our approach correctly
identify relations among segments?
Experiment Results
46
• RQ1: To what extent does our approach correctly
identify relations among segments?
Experiment Results
46
0%
25%
50%
75%
100%
ArgoUML JHotDraw Mars Maze Pooka
78
83
9
100
33
82
100100100100
Agreements without distinction btw. sub/super relations
Agreements with distinction btw. sub/super relations
• RQ1: To what extent does our approach correctly
identify relations among segments?
Experiment Results
46
0%
25%
50%
75%
100%
ArgoUML JHotDraw Mars Maze Pooka
78
83
9
100
33
82
100100100100
Agreements without distinction btw. sub/super relations
Agreements with distinction btw. sub/super relations
96%	

63%
Outline
• Context
• Problem Statement
• Trace Segmentation
• Segments Merging
• Segments Labelling
• Segments Relations
• Usefulness Evaluation
• Conclusion and Future Work
47
• During maintenance, developers are interested
to understand some segments of a trace that
implement some concepts of interest.
• Trace Segmentation approach groups these
concepts in few segments.
• Labelling and relating segments approach
guide developers towards segments that
implement the concepts to maintain and
reduce the number of methods to investigate.
48
Usefulness Evaluation
• Research Questions:
• RQ1: Does our trace segmentation has a
potential to support concept location?
• RQ2: To what extent does our approach support
concept location tasks?
• Projects:
Empirical Study
49
The dataset was made
publicly available by
Dit et al., [2013]
• RQ1: Does our trace segmentation approach has a
potential to support concept location?
Empirical Results
50
• RQ1: Does our trace segmentation approach has a
potential to support concept location?
Empirical Results
50
Number of
Segments
0
35
70
Min Mean Max
68
35,9
14
52,21
• RQ1: Does our trace segmentation approach has a
potential to support concept location?
Empirical Results
50
Number of
Segments
0
35
70
Min Mean Max
68
35,9
14
52,21
Percentage of the
size of the
segments
0%
3,5%
7%
Mean Max
6,47
2,48
• RQ2: To what extent does our approach support
concept location tasks if used as a standalone
technique?
• Title of the bug report;
• Labels of the segments;
• FCA lattice.
Empirical Results
51
• RQ2: To what extent does our approach support
concept location tasks if used as a standalone
technique?
Empirical Results
52
0%
20%
40%
60%
80%
10 20 30 40 50 60 70 80 90 100
57565454
51
48
43
37
33
27
70696968
62
59
53
47
43
36
75747473
6765
57
51
45
35
Retrieving segments containing gold set methods
Retrieving gold set methods
Number of methods to understand
Outline
• Context
• Problem Statement
• Trace Segmentation
• Segments Merging
• Segments Labelling
• Segments Relations
• Usefulness Evaluation
• Conclusion and Future Work
53
Conclusion
• A typical scenario in which concept location
takes part:
Execution Trace
54
Program Failure
55
Large, noisy, and multi-threaded
55
Trace
Segmentation
(SSBSE’11)
Large, noisy, and multi-threaded
55
Trace
Segmentation
(SSBSE’11)
Large, noisy, and multi-threaded
55
Trace
Segmentation
(SSBSE’11)
Labeling
Segments
(WCRE’12,
JSEP14)
Large, noisy, and multi-threaded
55
Trace
Segmentation
(SSBSE’11)
Labeling
Segments
(WCRE’12,
JSEP14)
Large, noisy, and multi-threaded
55
Trace
Segmentation
(SSBSE’11)
Labeling
Segments
(WCRE’12,
JSEP14)
Relating
Segments
(WCRE’12,
JSEP14)
Large, noisy, and multi-threaded
55
Trace
Segmentation
(SSBSE’11)
Labeling
Segments
(WCRE’12,
JSEP14)
Relating
Segments
(WCRE’12,
JSEP14)
Large, noisy, and multi-threaded
55
Trace
Segmentation
(SSBSE’11)
Labeling
Segments
(WCRE’12,
JSEP14)
Relating
Segments
(WCRE’12,
JSEP14)
Large, noisy, and multi-threaded
55
Trace
Segmentation
(SSBSE’11)
Labeling
Segments
(WCRE’12,
JSEP14)
Relating
Segments
(WCRE’12,
JSEP14)
Usefulness
Evaluation
(JSEP14)
Large, noisy, and multi-threaded
55
Trace
Segmentation
(SSBSE’11)
Labeling
Segments
(WCRE’12,
JSEP14)
Relating
Segments
(WCRE’12,
JSEP14)
Usefulness
Evaluation
(JSEP14)
Large, noisy, and multi-threaded
Thesis
56
Identify concepts and facilitate the analysis
of large execution traces for maintenance
tasks.
Future Work
• A tool to visualize the identified relations
among segments.
• Adapting our approach to online labelling of
traces while they are being generated.
!
• Trace segmentation of distributed systems.
57
58
Trace
Segmentation
(SSBSE’11)
Labeling
Segments
(WCRE’12,
JSEP14)
Relating
Segments
(WCRE’12,
JSEP14)
Usefulness
Evaluation
(JSEP14)
References
!
•Ian Sommerville. Software Engineering. Addison-Wesley, 6th edition, 2000.
•Dehaghani S. M. H et Hajrahimi N., 2013 . Which factors affect software
projects maintenance cost more? Acta Informatica Medica, 21, 63–66.
•REISS, S. P. et RENIERIS, M. (2001). Encoding program executions.(ICSE),
221–230.
•HAMOU-LHADJ, A. et LETHBRIDGE, T. (2006). Summarizing the content of
large traces to facilitate the understanding of the behaviour of a software
system. (ICPC), 181–190.
•HAMOU-LHADJ, A., BRAUN, E., AMYOT, D. et LETHBRIDGE, T. (2005).
Recovering behavioral design models from execution traces.(CSMR), 112–
121.
•PIRZADEH, H. et HAMOU-LHADJ, A. (2011). A novel approach based on
gestalt psychology for abstracting the content of large execution traces for
program comprehension. (ICECCS), 221–230.
!
!
59
References
• ASADI, F., ANTONIOL, G. et GUÉHÉNEUC, Y.-G. (2010). Concept locations
with genetic algorithms: A comparison of four distributed architectures.
(SSBSE), 153–162.
•DE LUCIA, A., DI PENTA, M., OLIVETO, R., PANICHELLA, A. et PANICHELLA,
S. (2012). Using IR methods for labeling source code artifacts: Is it
worthwhile? (ICPC), 193–202.
•DIT, B., HOLTZHAUER, A., POSHYVANYK, D. et KAGDI, H. (2013a). A dataset
from change history to support evaluation of software maintenance tasks.
(MSR), 131–134.
60

More Related Content

Similar to Thesis+of+soumaya+medini.ppt

Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02
Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02
Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02PMI_IREP_TP
 
Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)University of Washington
 
01-Introduction_to_Optimization-v2021.2-Sept23-2021.pptx
01-Introduction_to_Optimization-v2021.2-Sept23-2021.pptx01-Introduction_to_Optimization-v2021.2-Sept23-2021.pptx
01-Introduction_to_Optimization-v2021.2-Sept23-2021.pptxTran273185
 
Program Management 2.0: Circle-Dot Charts and Communication
Program Management 2.0: Circle-Dot Charts and CommunicationProgram Management 2.0: Circle-Dot Charts and Communication
Program Management 2.0: Circle-Dot Charts and CommunicationJohn Carter
 
Professor abubakr lecture 1 cad 2016
Professor abubakr lecture 1 cad  2016Professor abubakr lecture 1 cad  2016
Professor abubakr lecture 1 cad 2016Abuubakr Abdelwahab
 
Technology-Driven Development: Using Automation and Development Techniques to...
Technology-Driven Development: Using Automation and Development Techniques to...Technology-Driven Development: Using Automation and Development Techniques to...
Technology-Driven Development: Using Automation and Development Techniques to...Rakuten Group, Inc.
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?Brent Ozar
 
Agile Estimation for Fixed Price Model
Agile Estimation for Fixed Price ModelAgile Estimation for Fixed Price Model
Agile Estimation for Fixed Price Modeljayanth72
 
Growing as a software craftsperson (part 1) From Pune Software Craftsmanship.
Growing as a software craftsperson (part 1)  From Pune Software Craftsmanship.Growing as a software craftsperson (part 1)  From Pune Software Craftsmanship.
Growing as a software craftsperson (part 1) From Pune Software Craftsmanship.Dattatray Kale
 
Vision Based Analysis on Trajectories of Notes Representing Ideas Toward Work...
Vision Based Analysis on Trajectories of Notes Representing Ideas Toward Work...Vision Based Analysis on Trajectories of Notes Representing Ideas Toward Work...
Vision Based Analysis on Trajectories of Notes Representing Ideas Toward Work...Yuji Oyamada
 
224 - Factors Impacting Rapid Releases: An Industrial Case Study
224 - Factors Impacting Rapid Releases: An Industrial Case Study224 - Factors Impacting Rapid Releases: An Industrial Case Study
224 - Factors Impacting Rapid Releases: An Industrial Case StudyESEM 2014
 
Steps in Simulation Study
Steps in Simulation StudySteps in Simulation Study
Steps in Simulation StudyNalin Adhikari
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
 
Reengineering framework for open source software using decision tree approach
Reengineering framework for open source software using decision tree approachReengineering framework for open source software using decision tree approach
Reengineering framework for open source software using decision tree approachIJECEIAES
 
Siemens-PLM-NX-CAE-overview-br-X24
Siemens-PLM-NX-CAE-overview-br-X24Siemens-PLM-NX-CAE-overview-br-X24
Siemens-PLM-NX-CAE-overview-br-X24Brady Walther
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
 
FYP Presentation afnan
FYP Presentation afnanFYP Presentation afnan
FYP Presentation afnanMOHAMADAFNAN1
 

Similar to Thesis+of+soumaya+medini.ppt (20)

Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02
Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02
Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02
 
Flowcharting week 5 2019 2020
Flowcharting week 5  2019  2020Flowcharting week 5  2019  2020
Flowcharting week 5 2019 2020
 
Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)
 
01-Introduction_to_Optimization-v2021.2-Sept23-2021.pptx
01-Introduction_to_Optimization-v2021.2-Sept23-2021.pptx01-Introduction_to_Optimization-v2021.2-Sept23-2021.pptx
01-Introduction_to_Optimization-v2021.2-Sept23-2021.pptx
 
Program Management 2.0: Circle-Dot Charts and Communication
Program Management 2.0: Circle-Dot Charts and CommunicationProgram Management 2.0: Circle-Dot Charts and Communication
Program Management 2.0: Circle-Dot Charts and Communication
 
Professor abubakr lecture 1 cad 2016
Professor abubakr lecture 1 cad  2016Professor abubakr lecture 1 cad  2016
Professor abubakr lecture 1 cad 2016
 
Technology-Driven Development: Using Automation and Development Techniques to...
Technology-Driven Development: Using Automation and Development Techniques to...Technology-Driven Development: Using Automation and Development Techniques to...
Technology-Driven Development: Using Automation and Development Techniques to...
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
 
Agile Estimation for Fixed Price Model
Agile Estimation for Fixed Price ModelAgile Estimation for Fixed Price Model
Agile Estimation for Fixed Price Model
 
Growing as a software craftsperson (part 1) From Pune Software Craftsmanship.
Growing as a software craftsperson (part 1)  From Pune Software Craftsmanship.Growing as a software craftsperson (part 1)  From Pune Software Craftsmanship.
Growing as a software craftsperson (part 1) From Pune Software Craftsmanship.
 
Vision Based Analysis on Trajectories of Notes Representing Ideas Toward Work...
Vision Based Analysis on Trajectories of Notes Representing Ideas Toward Work...Vision Based Analysis on Trajectories of Notes Representing Ideas Toward Work...
Vision Based Analysis on Trajectories of Notes Representing Ideas Toward Work...
 
224 - Factors Impacting Rapid Releases: An Industrial Case Study
224 - Factors Impacting Rapid Releases: An Industrial Case Study224 - Factors Impacting Rapid Releases: An Industrial Case Study
224 - Factors Impacting Rapid Releases: An Industrial Case Study
 
6sigma
6sigma6sigma
6sigma
 
Steps in Simulation Study
Steps in Simulation StudySteps in Simulation Study
Steps in Simulation Study
 
LECT9.ppt
LECT9.pptLECT9.ppt
LECT9.ppt
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Reengineering framework for open source software using decision tree approach
Reengineering framework for open source software using decision tree approachReengineering framework for open source software using decision tree approach
Reengineering framework for open source software using decision tree approach
 
Siemens-PLM-NX-CAE-overview-br-X24
Siemens-PLM-NX-CAE-overview-br-X24Siemens-PLM-NX-CAE-overview-br-X24
Siemens-PLM-NX-CAE-overview-br-X24
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
 
FYP Presentation afnan
FYP Presentation afnanFYP Presentation afnan
FYP Presentation afnan
 

More from Ptidej Team

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software MiniaturisationPtidej Team
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel BriandPtidej Team
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel AbdellatifPtidej Team
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh KermansaraviPtidej Team
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel GrichiPtidej Team
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano PolitowskiPtidej Team
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisisPtidej Team
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptPtidej Team
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptPtidej Team
 

More from Ptidej Team (20)

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software Miniaturisation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel Abdellatif
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh Kermansaravi
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna Abidi
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano Politowski
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisis
 
MIPA
MIPAMIPA
MIPA
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.ppt
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.ppt
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.ppt
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 

Recently uploaded

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 

Recently uploaded (20)

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 

Thesis+of+soumaya+medini.ppt

  • 1. CONCEPTS EXTRACTION FROM EXECUTION TRACES Montréal, November 17th 2014 Soumaya Medini ! Ph.D. Defense
  • 2. Outline • Context • Problem Statement • Trace Segmentation • Segments Merging • Segments Labelling • Segments Relations • Usefulness Evaluation • Conclusion and Future Work 2
  • 3. Context • Software maintenance effort is estimated to be more than 70% of the overall software cost. [Ian Sommerville, 2000] • Program comprehension require half of the effort devoted to software maintenance and evolution. [Dehaghani et Hajrahimi, 2013] 3 Maintenance Integration Unit Testing Coding Design Specifications Requirements Software Life-Cycle Costs
  • 4. • Understand a program: identify which concept this program implements. • Concept location aims at identifying concepts and locating them within code regions. • A concept represents a functionality of a program. 4 Context
  • 5. Motivation • A typical scenario in which concept location takes part: 5
  • 6. Motivation • A typical scenario in which concept location takes part: 5 Program Failure
  • 7. Motivation • A typical scenario in which concept location takes part: 5 Program Failure
  • 8. Motivation • A typical scenario in which concept location takes part: Execution Trace 5 Program Failure
  • 9. Motivation • A typical scenario in which concept location takes part: Execution Trace 5 Program Failure Analysing the one execution trace to identify the sequence of methods producing the failure.
  • 10. Problem Statement • Large and noisy: • Execution trace corresponding to draw a rectangle in JHotDraw contains 4,000 method calls. 6
  • 11. Problem Statement • Several approaches address these problems: • Compacting execution traces (encoding the whole execution as a directed acyclic graph) [Reiss and Renieris, 2001] • Building high-level behavioural models (detecting and filtering utilities) [Hamou-Lhadj et al., 2005] • Segmenting execution traces (textual analysis or clustering algorithms) [Asadi et al., 2010] [Pirzadeh and Hamou-Lhadj, 2011] 7
  • 12. Problem Statement • Several approaches address these problems: • Compacting execution traces (encoding the whole execution as a directed acyclic graph) [Reiss and Renieris, 2001] • Building high-level behavioural models (detecting and filtering utilities) [Hamou-Lhadj et al., 2005] • Segmenting execution traces (textual analysis or clustering algorithms) [Asadi et al., 2010] [Pirzadeh and Hamou-Lhadj, 2011] 7 None of these approaches guide developers towards segments that implements the concepts to maintain.
  • 13. Thesis 8 Identify concepts and facilitate the analysis of large execution traces for maintenance tasks.
  • 21. Outline • Context • Problem Statement • Trace Segmentation • Segments Merging • Segments Labelling • Segments Relations • Usefulness Evaluation • Conclusion and Future Work 10
  • 22. Trace Segmentation • Asadi et al. [2010]: identify concepts in execution trace by finding cohesive and decoupled fragments of the trace using Genetic Algorithm (GA). • Limitations: • Not scalable (7 hours). • Stability problems (different segmentation). 11
  • 23. Background • Steps: 1. System instrumentation and trace collection; 2. Pruning and compressing traces; 3. Textual analysis of method source code; 4. Trace splitting using optimization techniques. 12
  • 24. Step1: Program instrumentation and trace collection • We collect and tag traces. 13
  • 25. Step2: Pruning and compressing traces • Pruning: Remove too frequent method invocations. • Compressing: Remove repetitions. 14
  • 26. Step3: Textual analysis of Method source code • Extract identifiers from source code and comments. • Split identifiers using Camel-Case (getBook get and book). • Perform stemming (waited,waiting,waits wait). • Remove programming language keywords and english stop words. • Index terms and documents using the TF-IDF indexing mechanisms and apply LSI. 15
  • 27. Step4: Trace Splitting through optimization techniques • Execution trace segmentation solution must be found in large search spaces. • We must apply some optimization techniques to segment the trace. • Approach built upon a dynamic programming algorithm to: • Improve scalability; • Compute the exact splitting. 16
  • 28. Dynamic Programming (DP) Approach • Solve a problem by dividing the problem into sub-problems that are recursively solved. • The solution of the problem: combining the solutions of the sub-problems. • The quality of the segmentation of a trace into K segments: 17
  • 33. • Example of trace segmentation using DP. 19 Dynamic Programming (DP) Approach
  • 34. • Example of trace segmentation using DP. • Create a new segment. 19 Dynamic Programming (DP) Approach
  • 35. • Example of trace segmentation using DP. • Create a new segment. • Add the method to the last segment. 19 Dynamic Programming (DP) Approach
  • 36. Case Study Design • Research Questions: • RQ1: How do the performances of the GA and DP approaches compare? • RQ2: How do the GA and DP approaches perform? • Programs: 20
  • 37. Case Study Results • RQ1: How do the performances of the GA and DP approaches compare? 21 Programs Scenarios Number of Segments Fitness Time (s) GA DP GA DP GA DP ArgoUML Create Note 24 13 0.54 0.58 7,080 2.13 Create Class, Create Note 73 19 0.52 0.60 10,800 4.33 JHotDraw Draw Rectangle 17 21 0.39 0.67 2,040 0.13 Add Text, Draw Rectangle 21 21 0.38 0.69 1,260 0.64 Draw Rectangle, Cut Rectangle 56 20 0.46 0.72 1,200 0.86 Spawn Window, Draw Circle 63 26 0.34 0.69 240 1
  • 38. Case Study Results • RQ1: How do the performances of the GA and DP approaches compare? 21 Programs Scenarios Number of Segments Fitness Time (s) GA DP GA DP GA DP ArgoUML Create Note 24 13 0.54 0.58 7,080 2.13 Create Class, Create Note 73 19 0.52 0.60 10,800 4.33 JHotDraw Draw Rectangle 17 21 0.39 0.67 2,040 0.13 Add Text, Draw Rectangle 21 21 0.38 0.69 1,260 0.64 Draw Rectangle, Cut Rectangle 56 20 0.46 0.72 1,200 0.86 Spawn Window, Draw Circle 63 26 0.34 0.69 240 1
  • 39. Case Study Results • RQ1: How do the performances of the GA and DP approaches compare? 21 Programs Scenarios Number of Segments Fitness Time (s) GA DP GA DP GA DP ArgoUML Create Note 24 13 0.54 0.58 7,080 2.13 Create Class, Create Note 73 19 0.52 0.60 10,800 4.33 JHotDraw Draw Rectangle 17 21 0.39 0.67 2,040 0.13 Add Text, Draw Rectangle 21 21 0.38 0.69 1,260 0.64 Draw Rectangle, Cut Rectangle 56 20 0.46 0.72 1,200 0.86 Spawn Window, Draw Circle 63 26 0.34 0.69 240 1
  • 40. Case Study Results • RQ1: How do the performances of the GA and DP approaches compare? 21 Programs Scenarios Number of Segments Fitness Time (s) GA DP GA DP GA DP ArgoUML Create Note 24 13 0.54 0.58 7,080 2.13 Create Class, Create Note 73 19 0.52 0.60 10,800 4.33 JHotDraw Draw Rectangle 17 21 0.39 0.67 2,040 0.13 Add Text, Draw Rectangle 21 21 0.38 0.69 1,260 0.64 Draw Rectangle, Cut Rectangle 56 20 0.46 0.72 1,200 0.86 Spawn Window, Draw Circle 63 26 0.34 0.69 240 1 Programs Scenarios Number of Segments Fitness Time (s) GA DP GA DP GA DP ArgoUML Create Note 24 13 0.54 0.58 7,080 2.13 Create Class, Create Note 73 19 0.52 0.60 10,800 4.33 JHotDraw Draw Rectangle 17 21 0.39 0.67 2,040 0.13 Add Text, Draw Rectangle 21 21 0.38 0.69 1,260 0.64 Draw Rectangle, Cut Rectangle 56 20 0.46 0.72 1,200 0.86 Spawn Window, Draw Circle 63 26 0.34 0.69 240 1
  • 41. Case Study Results • RQ1: How do the performances of the GA and DP approaches compare? • Wilxocon test and Cliff’s delta effect size: Difference of the number of segments; Values of fitness function; Computation times. 22
  • 42. Case Study Results • RQ1: How do the performances of the GA and DP approaches compare? • Wilxocon test and Cliff’s delta effect size: Difference of the number of segments; Values of fitness function; Computation times. 22
  • 43. Case Study Results • RQ1: How do the performances of the GA and DP approaches compare? • Wilxocon test and Cliff’s delta effect size: Difference of the number of segments; Values of fitness function; Computation times. 22
  • 44. Case Study Results • RQ1: How do the performances of the GA and DP approaches compare? • Wilxocon test and Cliff’s delta effect size: Difference of the number of segments; Values of fitness function; Computation times. 22
  • 45. Case Study Results • RQ2: How do the GA and DP approaches perform? 23 Program Scenario Concept Jaccard Precision GA DP GA DP ArgoUML Create Note Create Note 0.33 0.87 1 0.99 Create Class, Create Note Create Class 0.26 0.53 1 1 Create Class, Create Note Create Note 0.34 0.56 1 1 JHotDraw Draw Rectangle Draw Rectangle 0.9 0.75 0.9 1 Add Text, Draw Rectangle Add Text 0.31 0.33 0.36 0.39 Add Text, Draw Rectangle Draw Rectangle 0.62 0.52 0.62 1 Draw Rectangle, Cut Rectangle Draw Rectangle 0.74 0.24 0.79 0.24 Draw Rectangle, Cut Rectangle Cut Rectangle 0.22 0.31 1 1 Spawn Window, Draw Circle Draw Circle 0.82 0.82 0.82 1 Spawn Window, Draw Circle Spawn Window 0.42 0.44 1 1
  • 46. Case Study Results • RQ2: How do the GA and DP approaches perform? 23 Program Scenario Concept Jaccard Precision GA DP GA DP ArgoUML Create Note Create Note 0.33 0.87 1 0.99 Create Class, Create Note Create Class 0.26 0.53 1 1 Create Class, Create Note Create Note 0.34 0.56 1 1 JHotDraw Draw Rectangle Draw Rectangle 0.9 0.75 0.9 1 Add Text, Draw Rectangle Add Text 0.31 0.33 0.36 0.39 Add Text, Draw Rectangle Draw Rectangle 0.62 0.52 0.62 1 Draw Rectangle, Cut Rectangle Draw Rectangle 0.74 0.24 0.79 0.24 Draw Rectangle, Cut Rectangle Cut Rectangle 0.22 0.31 1 1 Spawn Window, Draw Circle Draw Circle 0.82 0.82 0.82 1 Spawn Window, Draw Circle Spawn Window 0.42 0.44 1 1
  • 47. Case Study Results • RQ2: How do the GA and DP approaches perform? • Wilxocon test and Cliff’s delta effect size: Jaccad scores Precision 24
  • 48. Case Study Results • RQ2: How do the GA and DP approaches perform? • Wilxocon test and Cliff’s delta effect size: Jaccad scores Precision 24
  • 49. Case Study Results • RQ2: How do the GA and DP approaches perform? • Wilxocon test and Cliff’s delta effect size: Jaccad scores Precision 24
  • 50. Outline • Context • Problem Statement • Trace Segmentation • Segments Merging • Segments Labelling • Segments Relations • Usefulness Evaluation • Conclusion and Future Work 25
  • 51. Segments Merging • Multi-threading: induces variability in traces collected for a given scenario. 26
  • 52. Segments Merging • Multi-threading: induces variability in traces collected for a given scenario. • Scenario draw rectangle: 26 First Execution Second Execution Third Execution Original size Compressed size Number of segments
  • 53. Segments Merging • Multi-threading: induces variability in traces collected for a given scenario. • Scenario draw rectangle: 26 First Execution Second Execution Third Execution 4850 Methods 555 Methods 35 Segments Original size Compressed size Number of segments
  • 54. Segments Merging • Multi-threading: induces variability in traces collected for a given scenario. • Scenario draw rectangle: 26 First Execution Second Execution Third Execution 4850 Methods 555 Methods 35 Segments 6668 Methods 240 Methods 21 Segments Original size Compressed size Number of segments
  • 55. Segments Merging • Multi-threading: induces variability in traces collected for a given scenario. • Scenario draw rectangle: 26 First Execution Second Execution Third Execution 4850 Methods 555 Methods 35 Segments 6668 Methods 240 Methods 21 Segments 16 706 Methods 930 Methods 54 Segments Original size Compressed size Number of segments
  • 56. Segments Merging • We merge segments obtained in multiple executions of the same scenario. • Similarity: 27
  • 57. Segments Merging • We merge segments obtained in multiple executions of the same scenario. • Similarity: 27 Threshold
  • 58. Similarity Threshold 28 • Projects: 30 40 50 60 70 80 90 10305070 Similarity threshold Numberofdifferentterms False positives False negatives
  • 61. Segments Merging • Example: Threshold= 70% S1 S2 S3 S4 Z1 0.9 0.5 0.2 0.3 Z2 0.3 0.9 0.5 0.2 Z3 0.3 0.2 0.5 0.8 29
  • 62. Segments Merging • Example: Threshold= 70% S1 S2 S3 S4 Z1 0.9 0.5 0.2 0.3 Z2 0.3 0.9 0.5 0.2 Z3 0.3 0.2 0.5 0.8 S1 S2 S3 S4 Z1 0.9 0.5 0.2 0.3 Z2 0.3 0.9 0.5 0.2 Z3 0.3 0.2 0.5 0.8 29
  • 63. Segments Merging • Example: Threshold= 70% S1 S2 S3 S4 Z1 0.9 0.5 0.2 0.3 Z2 0.3 0.9 0.5 0.2 Z3 0.3 0.2 0.5 0.8 S1 S2 S3 S4 Z1 0.9 0.5 0.2 0.3 Z2 0.3 0.9 0.5 0.2 Z3 0.3 0.2 0.5 0.8 Synthetic Trace 29
  • 64. Outline • Context • Problem Statement • Trace Segmentation • Segments Merging • Segments Labelling • Segments Relations • Usefulness Evaluation • Conclusion and Future Work 30
  • 65. Segments Labelling • Issue: choice of the most appropriate source of information. 31
  • 66. Segments Labelling • Issue: choice of the most appropriate source of information. • Method bodies: 31
  • 67. Segments Labelling • Issue: choice of the most appropriate source of information. • Method bodies: Identifiers; 31
  • 68. Segments Labelling • Issue: choice of the most appropriate source of information. • Method bodies: Identifiers; Comments; 31
  • 69. Segments Labelling • Issue: choice of the most appropriate source of information. • Method bodies: Identifiers; Comments; • Method signature. 31
  • 70. Segments Labelling • Issue: choice of the most appropriate source of information. • Method bodies: Identifiers; Comments; • Method signature. Method signatures provide more meaningful terms when labeling software artifacts than other sources. [De Lucia, 2012] 31
  • 71. Segments Labelling • Source of information: terms contained in the signature of methods. • Hypothesis: a term appearing often in a particular segment, but not in other segments, provides important information for that segment. • Ranks the terms of the segment by TF-IDF and keeps the topmost ones. 32
  • 72. Segments Labelling • To reduce the time and effort: segments are characterized using some unique methods (TF-IDF). • Small version (5): result in loss of relevant information. • Medium version(15): preserve better the relevant information. 33
  • 73. • Research Questions: • RQ1: How do the labels produced by the participants change when providing them different amount of information? Experiment Design 34
  • 74. • Research Questions: • RQ1: How do the labels produced by the participants change when providing them different amount of information? • RQ2: How do the labels produced by the participants compare to the generated labels? Experiment Design 34
  • 75. • Research Questions: • RQ1: How do the labels produced by the participants change when providing them different amount of information? • RQ2: How do the labels produced by the participants compare to the generated labels? • Projects: Experiment Design 34
  • 76. • Research Questions: • RQ1: How do the labels produced by the participants change when providing them different amount of information? • RQ2: How do the labels produced by the participants compare to the generated labels? • Projects: • Participants: Experiment Design 0 10 21 31 Student Professional 34
  • 77. • RQ1: How do the labels produced by the participants change when providing them different amount of information? Experiment Design 35
  • 78. • RQ1: How do the labels produced by the participants change when providing them different amount of information? • We group participants into 3 groups. Each version is assigned to a different group. Experiment Design 35
  • 79. • RQ1: How do the labels produced by the participants change when providing them different amount of information? • We group participants into 3 groups. Each version is assigned to a different group. Experiment Design 35 Terms of small segment
  • 80. • RQ1: How do the labels produced by the participants change when providing them different amount of information? • We group participants into 3 groups. Each version is assigned to a different group. Experiment Design 35 Terms of small segment Terms of medium segment
  • 81. • RQ1: How do the labels produced by the participants change when providing them different amount of information? • We group participants into 3 groups. Each version is assigned to a different group. Experiment Design 35 Terms of small segment Terms of medium segment Terms of full segment
  • 82. • RQ1: How do the labels produced by the participants change when providing them different amount of information? • We group participants into 3 groups. Each version is assigned to a different group. Experiment Design 35 Terms of small segment Terms of medium segment Terms of full segment
  • 83. • RQ1: How do the labels produced by the participants change when providing them different amount of information? Experiment Results 36
  • 84. • RQ1: How do the labels produced by the participants change when providing them different amount of information? Experiment Results 36 Medium Version 0% 25% 50% 75% 100% 2 Participants 5 Participants 5856 50 68 Small Version 0% 25% 50% 75% 100% 2 Participants 5 Participants 4644 44 52 Precision Recall
  • 85. • RQ1: How do the labels produced by the participants change when providing them different amount of information? Experiment Results 36 Medium Version 0% 25% 50% 75% 100% 2 Participants 5 Participants 5856 50 68 Small Version 0% 25% 50% 75% 100% 2 Participants 5 Participants 4644 44 52 Precision Recall 92%
  • 86. • RQ1: How do the labels produced by the participants change when providing them different amount of information? Experiment Results 36 Medium Version 0% 25% 50% 75% 100% 2 Participants 5 Participants 5856 50 68 Small Version 0% 25% 50% 75% 100% 2 Participants 5 Participants 4644 44 52 Precision Recall 92% 76%
  • 87. • RQ1: How do the labels produced by the participants change when providing them different amount of information? • Two-way permutation test: Number of participants; Size of the segment (full, medium, small); Their interaction; Years of programming experience. Experiment Results 37
  • 88. • RQ1: How do the labels produced by the participants change when providing them different amount of information? • Two-way permutation test: Number of participants; Size of the segment (full, medium, small); Their interaction; Years of programming experience. Experiment Results 37
  • 89. • RQ1: How do the labels produced by the participants change when providing them different amount of information? • Two-way permutation test: Number of participants; Size of the segment (full, medium, small); Their interaction; Years of programming experience. Experiment Results 37
  • 90. • RQ1: How do the labels produced by the participants change when providing them different amount of information? • Two-way permutation test: Number of participants; Size of the segment (full, medium, small); Their interaction; Years of programming experience. Experiment Results 37
  • 91. • RQ1: How do the labels produced by the participants change when providing them different amount of information? • Two-way permutation test: Number of participants; Size of the segment (full, medium, small); Their interaction; Years of programming experience. Experiment Results 37
  • 92. • RQ2: How do the labels produced by the participants compare to the generated labels? Experiment Design 38
  • 93. • RQ2: How do the labels produced by the participants compare to the generated labels? • Oracle: 210 segments (less than 100) manually labelled by the participants. Experiment Design 38
  • 94. • RQ2: How do the labels produced by the participants compare to the generated labels? • Oracle: 210 segments (less than 100) manually labelled by the participants. • Evaluation: 1 participant and 2 participants. Experiment Design 38
  • 95. • RQ2: How do the labels produced by the participants compare to the generated labels? Experiment Results 39
  • 96. • RQ2: How do the labels produced by the participants compare to the generated labels? Experiment Results 39 0% 25% 50% 75% 100% 1 Participant 2 Participants Intersection 2 Participants Union 56 85 71 60 38 58 Precision Recall
  • 97. Outline • Context • Problem Statement • Trace Segmentation • Segments Merging • Segments Labelling • Segments Relations • Usefulness Evaluation • Conclusion and Future Work 40
  • 98. Segments Relations • Formal Concept Analysis: used to identify relations between concepts identified in different segments. • Groups objects that have common attributes: objects are segments and attributes are terms. • An FCA concept: maximal collection of objects that have common attributes. 41
  • 99. • FCA lattice for the execution trace of the scenario create a class. 42 Segments Relations
  • 100. • FCA lattice for the execution trace of the scenario create a class. 42 Segments Relations
  • 101. • FCA lattice for the execution trace of the scenario create a class. 42 Segments Relations
  • 102. • FCA lattice for the execution trace of the scenario create a class. 42 Segments Relations
  • 103. • FCA lattice for the trace of the scenario draw rectangle delete rectangle (32 segments). 43 Segments Relations
  • 104. • FCA lattice for the trace of the scenario draw rectangle delete rectangle (32 segments). 43 Segments Relations Difficult and demanding task of identifying relations
  • 105. • Research Questions: • RQ1: To what extent does our approach correctly identify relations among segments? Experiment Design 44
  • 106. • Research Questions: • RQ1: To what extent does our approach correctly identify relations among segments? • Projects: Experiment Design 44
  • 107. • Research Questions: • RQ1: To what extent does our approach correctly identify relations among segments? • Projects: • Participants: Experiment Design 0 10 21 31 Student Professional 44
  • 108. • RQ1: To what extent does our approach correctly identify relations among segments? Experiment Design 45
  • 109. • RQ1: To what extent does our approach correctly identify relations among segments? • 100 relations are validated by participants. Experiment Design 45
  • 110. • RQ1: To what extent does our approach correctly identify relations among segments? • 100 relations are validated by participants. Experiment Design 45 Segments Labels Relations Our Approach 9 listener, add, change, figure sub/super concept 10 figure, listener, add, internal, multicaster, event, change Participant 1 9 composite, figure, trigger, event sub/super concept 10 manage, figure, change, event, trigger Participant 2 9 abstract, figure, change, add, listener same concept 10 figure, change, event, multicaster, add, listener
  • 111. • RQ1: To what extent does our approach correctly identify relations among segments? Experiment Results 46
  • 112. • RQ1: To what extent does our approach correctly identify relations among segments? Experiment Results 46 0% 25% 50% 75% 100% ArgoUML JHotDraw Mars Maze Pooka 78 83 9 100 33 82 100100100100 Agreements without distinction btw. sub/super relations Agreements with distinction btw. sub/super relations
  • 113. • RQ1: To what extent does our approach correctly identify relations among segments? Experiment Results 46 0% 25% 50% 75% 100% ArgoUML JHotDraw Mars Maze Pooka 78 83 9 100 33 82 100100100100 Agreements without distinction btw. sub/super relations Agreements with distinction btw. sub/super relations 96% 63%
  • 114. Outline • Context • Problem Statement • Trace Segmentation • Segments Merging • Segments Labelling • Segments Relations • Usefulness Evaluation • Conclusion and Future Work 47
  • 115. • During maintenance, developers are interested to understand some segments of a trace that implement some concepts of interest. • Trace Segmentation approach groups these concepts in few segments. • Labelling and relating segments approach guide developers towards segments that implement the concepts to maintain and reduce the number of methods to investigate. 48 Usefulness Evaluation
  • 116. • Research Questions: • RQ1: Does our trace segmentation has a potential to support concept location? • RQ2: To what extent does our approach support concept location tasks? • Projects: Empirical Study 49 The dataset was made publicly available by Dit et al., [2013]
  • 117. • RQ1: Does our trace segmentation approach has a potential to support concept location? Empirical Results 50
  • 118. • RQ1: Does our trace segmentation approach has a potential to support concept location? Empirical Results 50 Number of Segments 0 35 70 Min Mean Max 68 35,9 14 52,21
  • 119. • RQ1: Does our trace segmentation approach has a potential to support concept location? Empirical Results 50 Number of Segments 0 35 70 Min Mean Max 68 35,9 14 52,21 Percentage of the size of the segments 0% 3,5% 7% Mean Max 6,47 2,48
  • 120. • RQ2: To what extent does our approach support concept location tasks if used as a standalone technique? • Title of the bug report; • Labels of the segments; • FCA lattice. Empirical Results 51
  • 121. • RQ2: To what extent does our approach support concept location tasks if used as a standalone technique? Empirical Results 52 0% 20% 40% 60% 80% 10 20 30 40 50 60 70 80 90 100 57565454 51 48 43 37 33 27 70696968 62 59 53 47 43 36 75747473 6765 57 51 45 35 Retrieving segments containing gold set methods Retrieving gold set methods Number of methods to understand
  • 122. Outline • Context • Problem Statement • Trace Segmentation • Segments Merging • Segments Labelling • Segments Relations • Usefulness Evaluation • Conclusion and Future Work 53
  • 123. Conclusion • A typical scenario in which concept location takes part: Execution Trace 54 Program Failure
  • 124. 55 Large, noisy, and multi-threaded
  • 134. Thesis 56 Identify concepts and facilitate the analysis of large execution traces for maintenance tasks.
  • 135. Future Work • A tool to visualize the identified relations among segments. • Adapting our approach to online labelling of traces while they are being generated. ! • Trace segmentation of distributed systems. 57
  • 137. References ! •Ian Sommerville. Software Engineering. Addison-Wesley, 6th edition, 2000. •Dehaghani S. M. H et Hajrahimi N., 2013 . Which factors affect software projects maintenance cost more? Acta Informatica Medica, 21, 63–66. •REISS, S. P. et RENIERIS, M. (2001). Encoding program executions.(ICSE), 221–230. •HAMOU-LHADJ, A. et LETHBRIDGE, T. (2006). Summarizing the content of large traces to facilitate the understanding of the behaviour of a software system. (ICPC), 181–190. •HAMOU-LHADJ, A., BRAUN, E., AMYOT, D. et LETHBRIDGE, T. (2005). Recovering behavioral design models from execution traces.(CSMR), 112– 121. •PIRZADEH, H. et HAMOU-LHADJ, A. (2011). A novel approach based on gestalt psychology for abstracting the content of large execution traces for program comprehension. (ICECCS), 221–230. ! ! 59
  • 138. References • ASADI, F., ANTONIOL, G. et GUÉHÉNEUC, Y.-G. (2010). Concept locations with genetic algorithms: A comparison of four distributed architectures. (SSBSE), 153–162. •DE LUCIA, A., DI PENTA, M., OLIVETO, R., PANICHELLA, A. et PANICHELLA, S. (2012). Using IR methods for labeling source code artifacts: Is it worthwhile? (ICPC), 193–202. •DIT, B., HOLTZHAUER, A., POSHYVANYK, D. et KAGDI, H. (2013a). A dataset from change history to support evaluation of software maintenance tasks. (MSR), 131–134. 60