SlideShare a Scribd company logo
1 of 21
Download to read offline
A Heuristic-based Approach
to Identify Concepts
in Execution Traces
CSMR 2010 - Madrid (Spain) 1
Fatemeh Asadi*
Massimiliano Di Penta**
Giuliano Antoniol*
Yann-Gaël Guéhéneuc**
* Ecole Polytechnique de Montréal, Canada
** Dept. Of Engineering – Univ. of Sannio, Italy
Motivations
• Software systems lack adequate documentation
• Developers try to understand systems through
– Static analyses, visualizations built upon static data
– Dynamic analyses, requiring the execution of the system
• (Dynamic) concept identification
Identify sets of method calls in execution traces responsible
CSMR 2010 - Madrid (Spain) 2
– Identify sets of method calls in execution traces responsible
for the implementation of domain concepts or user-observable
features
– Existing approaches based on static analysis [Anquetil and
Lethbridge (1998)], dynamic analysis [Wilde and Scully (1995)
Tonella and Ceccato (2004)], IR techniques [Poshyvanyk et
al. (2007)], or hybrid ones [Eaddy et al. (2008)]
Proposed approach
A novel approach that analyzes execution traces and
groups together method calls that:
(i) sequentially invoked together/in sequence
(ii) cohesive and decoupled from a conceptual point of view
Assumptions
CSMR 2010 - Madrid (Spain) 3
Let us consider a feature is being executed in a scenario
– e.g., “Open a Web page from a browser”
or “Save an image in a paint application”
The set of methods related to the feature is likely to be:
– (i) conceptually cohesive
– (ii) decoupled from those of other features
– (iii) sequentially invoked
Proposed approach
Step I – System instrumentation
Step II – Execution trace collection
Step III – Trace pruning and compression
Step IV – Textual analysis of methods’
CSMR 2010 - Madrid (Spain) 4
Step IV – Textual analysis of methods’
source code
Step V – Search-based concept
identification
Step I and Step II – Getting Traces
Step I - System instrumentation
System instrumented using the MoDeC instrumentor
– MoDeC tool to extract and model sequence diagrams for
Java systems
Java bytecode instrumentation tool
CSMR 2010 - Madrid (Spain) 5
– Inserts appropriate and dedicated method invocations in the
system to method/constructor entry/exit, points
– Allows for trace tagging
Step II - Execution trace collection
We exercise a system following operation sequences
taken from user manuals or use case descriptions
Step III – Trace Pruning and Compression
Removing methods not very useful for feature identification
Methods occurring in many scenarios
– Are often utility methods
– We use the same idea of tf-idf in Information Retrieval
Too frequent methods
– Could be for example related to crosscutting concerns
– We remove methods having a frequency
CSMR 2010 - Madrid (Spain) 6
– We remove methods having a frequency
Q3 + 2 ×××× IQR (75% percentile + 2 × the interquartile range)
Trace compression
Aim: collapse repetitions in execution traces
Purpose: reduce the search space for Step V
Examples:
– m1(); m1(); m1();
– m1(); m2(); m1(); m2();
Performed using the Run Length Encoding (RLE)
Applied for sub-sequences having an arbitrary length
m1();
m1; m2();
Step IV
Conceptual cohesion and coupling determined according
to [Marcus et al., 2008] and [Poshyvanyk et al., 2006]
Index identifiers, comments contained in methods
Extraction of identifiers and comment words
Camel-case splitting of composed identifiers
Stop word removal (English + Java keywords)
CSMR 2010 - Madrid (Spain) 7
Stop word removal (English + Java keywords)
Stemming using the Porter stemmer
Indexing using tf-idf
Reduce the term-document space into a (smaller) concept-
document space using Latent Semantic Indexing (LSI)
– Helps to cope with synonymy and homonymy
– Concept space=50
Step V
We use a search-based optimization technique based on Genetic
Algorithms (GA) to split traces into segments
Representation: a bit-vector where 1 indicates the end of a segment
Mutation: randomly flips a bit (i.e., splits or merge segments)
m1 m2 m1 m3 m4 m1 m4 m6 m1
0 1 0 0 1 0 0 0 1
Trace splitting
Representation
CSMR 2010 - Madrid (Spain) 8
Mutation: randomly flips a bit (i.e., splits or merge segments)
Crossover: two-points
Selection: Roulette Wheel
0 1 0 0 1 0 0 0 1
0 1 0 0 1 0 0 0 1
0 0 1 0 0 1 0 0 1
1 0 1 0 0 1 0 0 0 110
0 1 0 0 0 1 0 0 1
0 0 1 0 1 0 0 0 1
Step V – Quality of the Solution
Fitness Function:
Segment Cohesion is the average (textual) similarity
between any pair of methods in a segment
Segment Coupling is the average (textual) similarity
CSMR 2010 - Madrid (Spain) 9
Segment Coupling is the average (textual) similarity
between a segment and all other segments in the trace
Other GA parameters
200 individuals
2,000 generations for JHotDraw and 3,000 for ArgoUML
5% mutation probability, 70% crossover probability
Distributed GA implementation (across 4 servers)
Empirical Study
• Goal: analyze the novel concept location approach based
• Purpose: of evaluating its capability of identifying
meaningful concepts
• Quality focus: accuracy and completeness of the
identified concepts
• Context: an implementation of our approach and
execution traces extracted from two open source
CSMR 2010 - Madrid (Spain) 10
execution traces extracted from two open source
systems, JHotDraw and ArgoUML
Research Questions
RQ1: How stable is the GA, through
multiple runs, when identifying concepts
into execution traces?
RQ2: To what extent the identified
concepts match the ones in the oracle?
CSMR 2010 - Madrid (Spain) 11
concepts match the ones in the oracle?
RQ3: How accurate is the identification of
concepts in execution traces?
RQ1: GA stability
We compute the overlap between segmentations
obtained in multiple runs using the Jaccard overlap
Score
Two segments overlaps when they contain calls in the same position
of the trace
Because a segment of trace T1 overlaps with more segments of T2,
CSMR 2010 - Madrid (Spain) 12
2/4 3/42/3
Because a segment of trace T1 overlaps with more segments of T2,
the highest similarity is chosen
m1 m2 m1 m3 m4 m1 m4 m6 m1
m1 m2 m1 m3 m4 m1 m4 m6 m1
Run 1
Run 2
RQ1: Results
CSMR 2010 - Madrid (Spain) 13
Average overlap between 72% and 84%
Slightly higher convergence for ArgoUML
Ability of the algorithm to converge, despite the
relatively large search space
RQ2: Matching with the Oracle
We manually tag start-end of features while
executing the system
Using the MoDeC instrumentation tool
While executing the instrumented system, the user triggers the
introduction of <Start> and <Stop> tags in the trace
CSMR 2010 - Madrid (Spain) 14
Matching between identified traces and oracle
computed as in RQ1
2/4 3/42/3
m1 m2 m1 m3 m4 m1 m4 m6 m1
m1 m2 m1 m3 m4 m1 m4 m6 m1
Run 1
Oracle
RQ2: Results
CSMR 2010 - Madrid (Spain) 15
High overlap for some features
e.g., Draw rectangle or Draw circle
Lower for features obtained adapting other ones
e.g., Add text obtained adapting Draw rectangle
In other cases, low overlap is due to large segments
split into more smaller and cohesive ones
RQ3: Accuracy in trace identification
Computed similarly to RQ2, however we use
Precision instead of Jaccard overlap Score
CSMR 2010 - Madrid (Spain) 16
2/3 3/42/2
m1 m2 m1 m3 m4 m1 m4 m6 m1
m1 m2 m1 m3 m4 m1 m4 m6 m1
Run 1
Oracle
RQ3: Results
CSMR 2010 - Madrid (Spain) 17
Precision often very high
In most cases above 85% and often equal to 100%
Low precision (mean 32%) for Add text
Relatively low (mean 69%) for Draw rectangle
These two features are difficult to be distinguished
Inspection of the obtained segments
Add class (ArgoUML)
The approach split this long feature of 199 methods sequence into 5 segments
related to sub-features (creation of objects, adding the project class, handling
namespace, setting object properties, handling persistence of the diagram)
Create note (ArgoUML)
Only the first part (50 methods) of the trace composed of 88 calls was identified
Problems related to multi-threading
Problems related to collapsing (during compression) loops containing variants
CSMR 2010 - Madrid (Spain) 18
Cut rectangle (JHotDraw)
Only the last 39 out of 172 calls were included in the segment
Methods related to adding to the clipboard and showing the rectangle as “cut”
First methods related to GUI events and split in many small segments
Spawn window (JHotDraw)
72 out of 197 methods included
The remaining ones were related to setting up menu command properties
Threats to Validity
Construct validity (relation btw. theory and observation)
Multi-threading can change the ordering of calls in multiple
executions of the same scenario
A better assessment of the actual content of the obtained
segments is needed
Internal validity (presence of confounding factors)
CSMR 2010 - Madrid (Spain) 19
Internal validity (presence of confounding factors)
Trace tagging may be imprecise, again due to multi-threading
Noise due to utility methods
GA intrinsic randomness
External validity (generalization of findings)
We analyzed two different systems, multiple traces
As usual, further empirical evaluation is needed
Conclusions
We proposed a search-based approach to automatically locate
concepts in execution traces
By splitting traces into conceptually cohesive and decoupled segments
Empirical study on traces from JHotDraw and ArgoUML shows that
The approach is stable
Identified segments highly precise
Finer-splitting wrt. high-level features
CSMR 2010 - Madrid (Spain) 20
Finer-splitting wrt. high-level features
Limitations due to: multi-threading, GUI events, feature adaptation..
Work-in-progress:
Improve performance
Use enhanced compression techniques
Automatically label identified concepts
Perform an extensive empirical validation
Thank You!
CSMR 2010 - Madrid (Spain) 21
Questions?

More Related Content

What's hot

Mumbai University M.E computer engg syllabus
Mumbai University M.E computer engg syllabusMumbai University M.E computer engg syllabus
Mumbai University M.E computer engg syllabus
Shini Saji
 
Slide aansw
Slide aanswSlide aansw
Slide aansw
edge7
 
Colored petri nets theory and applications
Colored petri nets theory and applicationsColored petri nets theory and applications
Colored petri nets theory and applications
Abu Hussein
 
Modeling and Evaluation of Performance and Reliability of Component-based So...
Modeling and Evaluation of Performance and Reliability  of Component-based So...Modeling and Evaluation of Performance and Reliability  of Component-based So...
Modeling and Evaluation of Performance and Reliability of Component-based So...
Editor IJCATR
 

What's hot (17)

Dsp lab manual
Dsp lab manualDsp lab manual
Dsp lab manual
 
Fault Tolerant Parallel Filters Based On Bch Codes
Fault Tolerant Parallel Filters Based On Bch CodesFault Tolerant Parallel Filters Based On Bch Codes
Fault Tolerant Parallel Filters Based On Bch Codes
 
Mumbai University BE IT Sem 3 Syllabus
Mumbai University BE IT Sem 3 SyllabusMumbai University BE IT Sem 3 Syllabus
Mumbai University BE IT Sem 3 Syllabus
 
I0343047049
I0343047049I0343047049
I0343047049
 
vorlage
vorlagevorlage
vorlage
 
Mumbai University M.E computer engg syllabus
Mumbai University M.E computer engg syllabusMumbai University M.E computer engg syllabus
Mumbai University M.E computer engg syllabus
 
Robust Low-rank and Sparse Decomposition for Moving Object Detection
Robust Low-rank and Sparse Decomposition for Moving Object DetectionRobust Low-rank and Sparse Decomposition for Moving Object Detection
Robust Low-rank and Sparse Decomposition for Moving Object Detection
 
Design and Implementation of Test Vector Generation using Random Forest Techn...
Design and Implementation of Test Vector Generation using Random Forest Techn...Design and Implementation of Test Vector Generation using Random Forest Techn...
Design and Implementation of Test Vector Generation using Random Forest Techn...
 
Slide aansw
Slide aanswSlide aansw
Slide aansw
 
Summarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering TechniquesSummarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering Techniques
 
icpr_2012
icpr_2012icpr_2012
icpr_2012
 
Fpga human detection
Fpga human detectionFpga human detection
Fpga human detection
 
IRJET - License Plate Detection using Hybrid Morphological Technique and ...
IRJET -  	  License Plate Detection using Hybrid Morphological Technique and ...IRJET -  	  License Plate Detection using Hybrid Morphological Technique and ...
IRJET - License Plate Detection using Hybrid Morphological Technique and ...
 
1984 Article on An Application of AI to Operations Reserach
1984 Article on An Application of AI to Operations Reserach1984 Article on An Application of AI to Operations Reserach
1984 Article on An Application of AI to Operations Reserach
 
Colored petri nets theory and applications
Colored petri nets theory and applicationsColored petri nets theory and applications
Colored petri nets theory and applications
 
Modeling and Evaluation of Performance and Reliability of Component-based So...
Modeling and Evaluation of Performance and Reliability  of Component-based So...Modeling and Evaluation of Performance and Reliability  of Component-based So...
Modeling and Evaluation of Performance and Reliability of Component-based So...
 
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTGENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
 

Similar to Csmr10a.ppt

On the Semantics of Real-Time Domain Specific Modeling Languages
On the Semantics of Real-Time Domain Specific Modeling LanguagesOn the Semantics of Real-Time Domain Specific Modeling Languages
On the Semantics of Real-Time Domain Specific Modeling Languages
Jose E. Rivera
 
Chen2018.mac missing tag iceberg queries for multi category rfid system
Chen2018.mac missing tag iceberg queries for multi category rfid systemChen2018.mac missing tag iceberg queries for multi category rfid system
Chen2018.mac missing tag iceberg queries for multi category rfid system
novrain711
 

Similar to Csmr10a.ppt (20)

Ssbse10.ppt
Ssbse10.pptSsbse10.ppt
Ssbse10.ppt
 
SSBSE10.ppt
SSBSE10.pptSSBSE10.ppt
SSBSE10.ppt
 
Csmr10c.ppt
Csmr10c.pptCsmr10c.ppt
Csmr10c.ppt
 
SIMILARITY SEARCH FOR TRAJECTORIES OF RFID TAGS IN SUPPLY CHAIN TRAFFIC
SIMILARITY SEARCH FOR TRAJECTORIES OF RFID TAGS IN SUPPLY CHAIN TRAFFICSIMILARITY SEARCH FOR TRAJECTORIES OF RFID TAGS IN SUPPLY CHAIN TRAFFIC
SIMILARITY SEARCH FOR TRAJECTORIES OF RFID TAGS IN SUPPLY CHAIN TRAFFIC
 
Mastering AIOps with Deep Learning
Mastering AIOps with Deep LearningMastering AIOps with Deep Learning
Mastering AIOps with Deep Learning
 
BIRTE-13-Kawashima
BIRTE-13-KawashimaBIRTE-13-Kawashima
BIRTE-13-Kawashima
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptx
 
Icsm19.ppt
Icsm19.pptIcsm19.ppt
Icsm19.ppt
 
On the Semantics of Real-Time Domain Specific Modeling Languages
On the Semantics of Real-Time Domain Specific Modeling LanguagesOn the Semantics of Real-Time Domain Specific Modeling Languages
On the Semantics of Real-Time Domain Specific Modeling Languages
 
Chen2018.mac missing tag iceberg queries for multi category rfid system
Chen2018.mac missing tag iceberg queries for multi category rfid systemChen2018.mac missing tag iceberg queries for multi category rfid system
Chen2018.mac missing tag iceberg queries for multi category rfid system
 
Traffic sign recognition and detection using SVM and CNN
Traffic sign recognition and detection using SVM and CNNTraffic sign recognition and detection using SVM and CNN
Traffic sign recognition and detection using SVM and CNN
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug Prediction
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Wcre12c.ppt
Wcre12c.pptWcre12c.ppt
Wcre12c.ppt
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
DSR Routing Decisions for Mobile Ad Hoc Networks using Fuzzy Inference System
DSR Routing Decisions for Mobile Ad Hoc Networks using Fuzzy Inference SystemDSR Routing Decisions for Mobile Ad Hoc Networks using Fuzzy Inference System
DSR Routing Decisions for Mobile Ad Hoc Networks using Fuzzy Inference System
 
Data Structures and Algorithm Analysis
Data Structures  and  Algorithm AnalysisData Structures  and  Algorithm Analysis
Data Structures and Algorithm Analysis
 
Plan_design and FPGA implement of MIMO OFDM SDM systems
Plan_design and FPGA implement of MIMO OFDM SDM systemsPlan_design and FPGA implement of MIMO OFDM SDM systems
Plan_design and FPGA implement of MIMO OFDM SDM systems
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
 

More from Yann-Gaël Guéhéneuc

Evolution and Examples of Java Features, from Java 1.7 to Java 22
Evolution and Examples of Java Features, from Java 1.7 to Java 22Evolution and Examples of Java Features, from Java 1.7 to Java 22
Evolution and Examples of Java Features, from Java 1.7 to Java 22
Yann-Gaël Guéhéneuc
 
Consequences and Principles of Software Quality v0.3
Consequences and Principles of Software Quality v0.3Consequences and Principles of Software Quality v0.3
Consequences and Principles of Software Quality v0.3
Yann-Gaël Guéhéneuc
 
On Reflection in OO Programming Languages v1.6
On Reflection in OO Programming Languages v1.6On Reflection in OO Programming Languages v1.6
On Reflection in OO Programming Languages v1.6
Yann-Gaël Guéhéneuc
 

More from Yann-Gaël Guéhéneuc (20)

Advice for writing a NSERC Discovery grant application v0.5
Advice for writing a NSERC Discovery grant application v0.5Advice for writing a NSERC Discovery grant application v0.5
Advice for writing a NSERC Discovery grant application v0.5
 
Ptidej Architecture, Design, and Implementation in Action v2.1
Ptidej Architecture, Design, and Implementation in Action v2.1Ptidej Architecture, Design, and Implementation in Action v2.1
Ptidej Architecture, Design, and Implementation in Action v2.1
 
Evolution and Examples of Java Features, from Java 1.7 to Java 22
Evolution and Examples of Java Features, from Java 1.7 to Java 22Evolution and Examples of Java Features, from Java 1.7 to Java 22
Evolution and Examples of Java Features, from Java 1.7 to Java 22
 
Consequences and Principles of Software Quality v0.3
Consequences and Principles of Software Quality v0.3Consequences and Principles of Software Quality v0.3
Consequences and Principles of Software Quality v0.3
 
Some Pitfalls with Python and Their Possible Solutions v0.9
Some Pitfalls with Python and Their Possible Solutions v0.9Some Pitfalls with Python and Their Possible Solutions v0.9
Some Pitfalls with Python and Their Possible Solutions v0.9
 
An Explanation of the Unicode, the Text Encoding Standard, Its Usages and Imp...
An Explanation of the Unicode, the Text Encoding Standard, Its Usages and Imp...An Explanation of the Unicode, the Text Encoding Standard, Its Usages and Imp...
An Explanation of the Unicode, the Text Encoding Standard, Its Usages and Imp...
 
An Explanation of the Halting Problem and Its Consequences
An Explanation of the Halting Problem and Its ConsequencesAn Explanation of the Halting Problem and Its Consequences
An Explanation of the Halting Problem and Its Consequences
 
Are CPUs VMs Like Any Others? v1.0
Are CPUs VMs Like Any Others? v1.0Are CPUs VMs Like Any Others? v1.0
Are CPUs VMs Like Any Others? v1.0
 
Informaticien(ne)s célèbres (v1.0.2, 19/02/20)
Informaticien(ne)s célèbres (v1.0.2, 19/02/20)Informaticien(ne)s célèbres (v1.0.2, 19/02/20)
Informaticien(ne)s célèbres (v1.0.2, 19/02/20)
 
Well-known Computer Scientists v1.0.2
Well-known Computer Scientists v1.0.2Well-known Computer Scientists v1.0.2
Well-known Computer Scientists v1.0.2
 
On Java Generics, History, Use, Caveats v1.1
On Java Generics, History, Use, Caveats v1.1On Java Generics, History, Use, Caveats v1.1
On Java Generics, History, Use, Caveats v1.1
 
On Reflection in OO Programming Languages v1.6
On Reflection in OO Programming Languages v1.6On Reflection in OO Programming Languages v1.6
On Reflection in OO Programming Languages v1.6
 
ICSOC'21
ICSOC'21ICSOC'21
ICSOC'21
 
Vissoft21.ppt
Vissoft21.pptVissoft21.ppt
Vissoft21.ppt
 
Service computation20.ppt
Service computation20.pptService computation20.ppt
Service computation20.ppt
 
Serp4 iot20.ppt
Serp4 iot20.pptSerp4 iot20.ppt
Serp4 iot20.ppt
 
Msr20.ppt
Msr20.pptMsr20.ppt
Msr20.ppt
 
Iwesep19.ppt
Iwesep19.pptIwesep19.ppt
Iwesep19.ppt
 
Icsoc20.ppt
Icsoc20.pptIcsoc20.ppt
Icsoc20.ppt
 
Icsoc18.ppt
Icsoc18.pptIcsoc18.ppt
Icsoc18.ppt
 

Recently uploaded

Recently uploaded (20)

Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in Uganda
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid Environments
 
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 

Csmr10a.ppt

  • 1. A Heuristic-based Approach to Identify Concepts in Execution Traces CSMR 2010 - Madrid (Spain) 1 Fatemeh Asadi* Massimiliano Di Penta** Giuliano Antoniol* Yann-Gaël Guéhéneuc** * Ecole Polytechnique de Montréal, Canada ** Dept. Of Engineering – Univ. of Sannio, Italy
  • 2. Motivations • Software systems lack adequate documentation • Developers try to understand systems through – Static analyses, visualizations built upon static data – Dynamic analyses, requiring the execution of the system • (Dynamic) concept identification Identify sets of method calls in execution traces responsible CSMR 2010 - Madrid (Spain) 2 – Identify sets of method calls in execution traces responsible for the implementation of domain concepts or user-observable features – Existing approaches based on static analysis [Anquetil and Lethbridge (1998)], dynamic analysis [Wilde and Scully (1995) Tonella and Ceccato (2004)], IR techniques [Poshyvanyk et al. (2007)], or hybrid ones [Eaddy et al. (2008)]
  • 3. Proposed approach A novel approach that analyzes execution traces and groups together method calls that: (i) sequentially invoked together/in sequence (ii) cohesive and decoupled from a conceptual point of view Assumptions CSMR 2010 - Madrid (Spain) 3 Let us consider a feature is being executed in a scenario – e.g., “Open a Web page from a browser” or “Save an image in a paint application” The set of methods related to the feature is likely to be: – (i) conceptually cohesive – (ii) decoupled from those of other features – (iii) sequentially invoked
  • 4. Proposed approach Step I – System instrumentation Step II – Execution trace collection Step III – Trace pruning and compression Step IV – Textual analysis of methods’ CSMR 2010 - Madrid (Spain) 4 Step IV – Textual analysis of methods’ source code Step V – Search-based concept identification
  • 5. Step I and Step II – Getting Traces Step I - System instrumentation System instrumented using the MoDeC instrumentor – MoDeC tool to extract and model sequence diagrams for Java systems Java bytecode instrumentation tool CSMR 2010 - Madrid (Spain) 5 – Inserts appropriate and dedicated method invocations in the system to method/constructor entry/exit, points – Allows for trace tagging Step II - Execution trace collection We exercise a system following operation sequences taken from user manuals or use case descriptions
  • 6. Step III – Trace Pruning and Compression Removing methods not very useful for feature identification Methods occurring in many scenarios – Are often utility methods – We use the same idea of tf-idf in Information Retrieval Too frequent methods – Could be for example related to crosscutting concerns – We remove methods having a frequency CSMR 2010 - Madrid (Spain) 6 – We remove methods having a frequency Q3 + 2 ×××× IQR (75% percentile + 2 × the interquartile range) Trace compression Aim: collapse repetitions in execution traces Purpose: reduce the search space for Step V Examples: – m1(); m1(); m1(); – m1(); m2(); m1(); m2(); Performed using the Run Length Encoding (RLE) Applied for sub-sequences having an arbitrary length m1(); m1; m2();
  • 7. Step IV Conceptual cohesion and coupling determined according to [Marcus et al., 2008] and [Poshyvanyk et al., 2006] Index identifiers, comments contained in methods Extraction of identifiers and comment words Camel-case splitting of composed identifiers Stop word removal (English + Java keywords) CSMR 2010 - Madrid (Spain) 7 Stop word removal (English + Java keywords) Stemming using the Porter stemmer Indexing using tf-idf Reduce the term-document space into a (smaller) concept- document space using Latent Semantic Indexing (LSI) – Helps to cope with synonymy and homonymy – Concept space=50
  • 8. Step V We use a search-based optimization technique based on Genetic Algorithms (GA) to split traces into segments Representation: a bit-vector where 1 indicates the end of a segment Mutation: randomly flips a bit (i.e., splits or merge segments) m1 m2 m1 m3 m4 m1 m4 m6 m1 0 1 0 0 1 0 0 0 1 Trace splitting Representation CSMR 2010 - Madrid (Spain) 8 Mutation: randomly flips a bit (i.e., splits or merge segments) Crossover: two-points Selection: Roulette Wheel 0 1 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 0 0 110 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1
  • 9. Step V – Quality of the Solution Fitness Function: Segment Cohesion is the average (textual) similarity between any pair of methods in a segment Segment Coupling is the average (textual) similarity CSMR 2010 - Madrid (Spain) 9 Segment Coupling is the average (textual) similarity between a segment and all other segments in the trace Other GA parameters 200 individuals 2,000 generations for JHotDraw and 3,000 for ArgoUML 5% mutation probability, 70% crossover probability Distributed GA implementation (across 4 servers)
  • 10. Empirical Study • Goal: analyze the novel concept location approach based • Purpose: of evaluating its capability of identifying meaningful concepts • Quality focus: accuracy and completeness of the identified concepts • Context: an implementation of our approach and execution traces extracted from two open source CSMR 2010 - Madrid (Spain) 10 execution traces extracted from two open source systems, JHotDraw and ArgoUML
  • 11. Research Questions RQ1: How stable is the GA, through multiple runs, when identifying concepts into execution traces? RQ2: To what extent the identified concepts match the ones in the oracle? CSMR 2010 - Madrid (Spain) 11 concepts match the ones in the oracle? RQ3: How accurate is the identification of concepts in execution traces?
  • 12. RQ1: GA stability We compute the overlap between segmentations obtained in multiple runs using the Jaccard overlap Score Two segments overlaps when they contain calls in the same position of the trace Because a segment of trace T1 overlaps with more segments of T2, CSMR 2010 - Madrid (Spain) 12 2/4 3/42/3 Because a segment of trace T1 overlaps with more segments of T2, the highest similarity is chosen m1 m2 m1 m3 m4 m1 m4 m6 m1 m1 m2 m1 m3 m4 m1 m4 m6 m1 Run 1 Run 2
  • 13. RQ1: Results CSMR 2010 - Madrid (Spain) 13 Average overlap between 72% and 84% Slightly higher convergence for ArgoUML Ability of the algorithm to converge, despite the relatively large search space
  • 14. RQ2: Matching with the Oracle We manually tag start-end of features while executing the system Using the MoDeC instrumentation tool While executing the instrumented system, the user triggers the introduction of <Start> and <Stop> tags in the trace CSMR 2010 - Madrid (Spain) 14 Matching between identified traces and oracle computed as in RQ1 2/4 3/42/3 m1 m2 m1 m3 m4 m1 m4 m6 m1 m1 m2 m1 m3 m4 m1 m4 m6 m1 Run 1 Oracle
  • 15. RQ2: Results CSMR 2010 - Madrid (Spain) 15 High overlap for some features e.g., Draw rectangle or Draw circle Lower for features obtained adapting other ones e.g., Add text obtained adapting Draw rectangle In other cases, low overlap is due to large segments split into more smaller and cohesive ones
  • 16. RQ3: Accuracy in trace identification Computed similarly to RQ2, however we use Precision instead of Jaccard overlap Score CSMR 2010 - Madrid (Spain) 16 2/3 3/42/2 m1 m2 m1 m3 m4 m1 m4 m6 m1 m1 m2 m1 m3 m4 m1 m4 m6 m1 Run 1 Oracle
  • 17. RQ3: Results CSMR 2010 - Madrid (Spain) 17 Precision often very high In most cases above 85% and often equal to 100% Low precision (mean 32%) for Add text Relatively low (mean 69%) for Draw rectangle These two features are difficult to be distinguished
  • 18. Inspection of the obtained segments Add class (ArgoUML) The approach split this long feature of 199 methods sequence into 5 segments related to sub-features (creation of objects, adding the project class, handling namespace, setting object properties, handling persistence of the diagram) Create note (ArgoUML) Only the first part (50 methods) of the trace composed of 88 calls was identified Problems related to multi-threading Problems related to collapsing (during compression) loops containing variants CSMR 2010 - Madrid (Spain) 18 Cut rectangle (JHotDraw) Only the last 39 out of 172 calls were included in the segment Methods related to adding to the clipboard and showing the rectangle as “cut” First methods related to GUI events and split in many small segments Spawn window (JHotDraw) 72 out of 197 methods included The remaining ones were related to setting up menu command properties
  • 19. Threats to Validity Construct validity (relation btw. theory and observation) Multi-threading can change the ordering of calls in multiple executions of the same scenario A better assessment of the actual content of the obtained segments is needed Internal validity (presence of confounding factors) CSMR 2010 - Madrid (Spain) 19 Internal validity (presence of confounding factors) Trace tagging may be imprecise, again due to multi-threading Noise due to utility methods GA intrinsic randomness External validity (generalization of findings) We analyzed two different systems, multiple traces As usual, further empirical evaluation is needed
  • 20. Conclusions We proposed a search-based approach to automatically locate concepts in execution traces By splitting traces into conceptually cohesive and decoupled segments Empirical study on traces from JHotDraw and ArgoUML shows that The approach is stable Identified segments highly precise Finer-splitting wrt. high-level features CSMR 2010 - Madrid (Spain) 20 Finer-splitting wrt. high-level features Limitations due to: multi-threading, GUI events, feature adaptation.. Work-in-progress: Improve performance Use enhanced compression techniques Automatically label identified concepts Perform an extensive empirical validation
  • 21. Thank You! CSMR 2010 - Madrid (Spain) 21 Questions?