SlideShare a Scribd company logo
1 of 20
Comparing Software Architecture Recovery
Using Accurate Dependency
Presented by
Imdad-Ul-Haq
ID# 7661
Introduction
Software architecture is crucial for program comprehension,
programmer communication, and software maintenance. Unfortunately,
documented software architectures are either nonexistent or outdated
for many software projects. While it is important for developers to
document software architecture and keep it up-to-date, it is costly and
difļ¬cult. Software grows bigger day by day so itā€™s difficult for developer
to have complete knowledge about system. There are many techniques
proposed for software architecture recovery.
Problem Statement
Developer using different techniques for software architecture recovery
and they all have some effectiveness. Now to understand there
effectiveness we use comparison among these techniques. Chromium is
the largest open source software project which contains 10MLOC and
order to recover it by experienced recoverer then itā€™s take a lot of time
to recover.
Methods For Comparison
Include Dependency:
The include dependencies are inaccurate. For example, ļ¬le foo.c may declare that it includes
bar.h, but may not use any functions or variables declared or deļ¬ned in bar.h. Using include
dependencies, one would conclude that foo.c depends on bar.h, while foo.c has no actualcode
dependency on bar.h.
Symbol Dependency:
The symbol dependencies are more accurate. A sym-bol can be a function name or a global
variable. For example, consider two ļ¬les Alpha.c and Beta.c: ļ¬le Alpha.c
contains method A; and ļ¬le Beta.c contains method B. If method A invokes method B, then
method A depends on method B. Based on this information, we can conclude that ļ¬le Alpha.c
depends on ļ¬le Beta.c.
Approaches
Our approach consists of two parts: the extraction of symbol dependencies and obtaining a
ground-truth architecture. In the second part we use our methods include dependency and
symbol dependency. The first one is used in C/C++ and the other one is used in Java.
1) Obtaining dependency for C/C++:
To extract symbol dependencies, we use the technique which is compiles a projectā€™s source
ļ¬les with LLVM into bit code, analyzes the bit code to extract the symbol dependencies for all
symbols inside the project, and groups dependencies based on the ļ¬les containing the symbols.
Many header files contain only symbol declaration and to ensure we do not miss the
dependency we #include statement in the source code. Include dependencies are transitive
dependencies.
Approaches
2) Obtaining dependency for Java Projects:
To extract symbol dependencies, we leverage a tool that operates at the Java bytecode level
and extracts high-level information from the bytecode in a human readable format. We extract
include dependencies for Java projects from import statements in Java source code by utilizing a
script to determine imports and their associated ļ¬les. The script used to extract the
dependencies detects all the ļ¬les in a package. Then for every ļ¬le, it evaluates each import
statement and adds the ļ¬les mentioned in the import as a dependency. When a wildcard import
is evaluated, all classes in the referred package are added as dependencies. For Java projects,
include dependencies miss relationships between ļ¬les because they do not account for intra-
package dependencies or fully-qualiļ¬ed name usage.
Approaches
3) Obtaining Ground truth architecture:
We using the sub module method for obtaining the preliminary ground truth architecture.
It consists of three steps. First, we determine the module that each ļ¬le belongs to by analyzing
the conļ¬guration ļ¬les of the project. Second, we determine the sub module relationship
between modules. We deļ¬ne a sub module as a module that has all of its ļ¬les contained within
the subdirectory of another module.
Continue
We group modules that are sub modules of one another into a cluster. In the example above,
we cluster modules A, B and C into a single cluster. This preliminary version of the ground-truth
architecture does not accurately reļ¬‚ect the ā€œrealā€ architecture of the project.
Techniques
We use six algorithms in which four use dependency to determine the clusters while the
remaining two techniques use textual information from source code.
1) ACDC (Algorithm for Comprehension-Driven Clustering):
ACDC performed well and the main pattern used by it called ā€œsub-graph dominator
patternā€. ACDC detects a dominator node n0 and a set of nodes N = {ni | i āˆˆ N } that n0
dominates. ACDC groups the nodes of such a sub-graph together into a cluster.
2) Bunch:
The Bunch technique that transforms the architecture recovery problem into an optimization
problem. An optimization function called Modularization Quality (MQ) represents the quality of
a recovered architecture. Bunch uses hill-climbing and genetic algorithms to ļ¬nd a partition (i.e.,
a grouping of software entities into clusters) that maximizes MQ.
Techniques
3) WCA (Weighted Combined Algorithm):
WCA is a hierarchical clustering algorithm that measures the inter-cluster distance between
software entities and merges them into clusters based on this distance. Two measures are
proposed to measure the inter-cluster distance: Unbiased Ellenberg (UE) and Unbiased
Ellenberg-NM (UENM). The main difference between these measures is that UENM integrates
more information into the measure and thus might obtain better results.
4) LIMBO:
LIMBO is also a hierarchical clustering algorithm that aims to make the Information Bottleneck
algorithm scalable for large data sets.
Techniques
5) ARC (Architecture Recovery using Concerns):
ARC is a hierarchical clustering algorithm that relies on information retrieval and machine
learning to perform a recovery. This technique does not use dependencies and is therefore not
used to evaluate the inļ¬‚uence of different levels of dependencies.
6) ZBR (Zone Based Recovery):
ZBR is a recovery technique based on natural language semantics of identiļ¬ers found in the
source code. ZBR demonstrated accuracy in recovering Java package structure but struggled
with memory issues when dealing with larger systems .
Analyzed Projects
We use the 5 large scale and open source projects which included Chromium , Bash , Arch
Studio , ITK and Hadoop. For Bash, Hadoop, and Arch Studio, all techniques take a few seconds
to a few minutes to run. For large projects such as ITK and Chromium, each technique take
several hours or days to run.
Accuracy Measures
A recovered architecture may be different from a ground-truth architecture, but close to
another ground-truth architecture of the same project. We used some matrices to find the
accuracy of the above projects by include and symbol dependency methods.
1) TURBOMQ:
TURBOMQ is an independent of any ground-truth architecture, which calculates the quality
of the recovered architectures.
Āµ i is the number of intra-relationships; ɛij + ɛji is the number of inter-relationships between
cluster i and cluster j.
Accuracy Measures
2) MoJoFM:
MojoFM is denoted by the following formula
where mno(A, B) is the minimum number of Move or Join operations needed to transform the
recovered architecture A into the ground truth B. A score of 100% indicates that the architecture
recovered is the same as the ground-truth architecture.
3) Architecture to Architecture:
where mto(Ai , A j ) is the minimum number of operations needed to transform architecture Ai into A j ; and aco(Ai ) is
the number of operations needed to construct architecture Ai from a ā€œnullā€ architecture Aāˆ….
Accuracy Measures
4) Cluster to Cluster:
The following formula denotes the cluster algorithm where
A1 is the recovered architecture; A2 is a ground-truth architecture; and A2. C are the clusters of
A2 . thcvg is a threshold indicating how high the c2c value must be for a techniqueā€™s cluster and a
ground-truth cluster in order to count the latter as covered.
Results
Results
Results
Conclusion
The above experiment shows that symbol dependency gives more accurate result then include
dependency. In general, each recovery technique extracted a better quality architecture when
using symbol dependencies instead of the less-detailed include dependencies. In general
conclusion that quality of input affects quality of output. The results show that not only does
each recovery technique produce better output with better input, but also that the highest
scoring technique often changes when the input changes.
THANK YOU

More Related Content

What's hot

Collaborative archietyped for ipv4
Collaborative archietyped for ipv4Collaborative archietyped for ipv4
Collaborative archietyped for ipv4
Fredrick Ishengoma
Ā 
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
Raffi Khatchadourian
Ā 
Scc2012 Scala
Scc2012 ScalaScc2012 Scala
Scc2012 Scala
steccami
Ā 
220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization
taeseon ryu
Ā 

What's hot (19)

FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...
FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...
FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...
Ā 
ERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to Task
Ā 
Collaborative archietyped for ipv4
Collaborative archietyped for ipv4Collaborative archietyped for ipv4
Collaborative archietyped for ipv4
Ā 
Exploring Models of Computation through Static Analysis
Exploring Models of Computation through Static AnalysisExploring Models of Computation through Static Analysis
Exploring Models of Computation through Static Analysis
Ā 
Novel high functionality fault tolerant ALU
Novel high functionality fault tolerant ALUNovel high functionality fault tolerant ALU
Novel high functionality fault tolerant ALU
Ā 
Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...
Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...
Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...
Ā 
Belief Propagation Decoder for LDPC Codes Based on VLSI Implementation
Belief Propagation Decoder for LDPC Codes Based on VLSI ImplementationBelief Propagation Decoder for LDPC Codes Based on VLSI Implementation
Belief Propagation Decoder for LDPC Codes Based on VLSI Implementation
Ā 
Automatically Generated Simulations for Predicting Software-Defined Networkin...
Automatically Generated Simulations for Predicting Software-Defined Networkin...Automatically Generated Simulations for Predicting Software-Defined Networkin...
Automatically Generated Simulations for Predicting Software-Defined Networkin...
Ā 
A57040102
A57040102A57040102
A57040102
Ā 
IT6511 Networks Laboratory
IT6511 Networks LaboratoryIT6511 Networks Laboratory
IT6511 Networks Laboratory
Ā 
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
Ā 
Reliability Improvement in Logic Circuit Stochastic Computation
Reliability Improvement in Logic Circuit Stochastic ComputationReliability Improvement in Logic Circuit Stochastic Computation
Reliability Improvement in Logic Circuit Stochastic Computation
Ā 
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHMAN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
Ā 
Masters_Thesis_FINAL_COPY
Masters_Thesis_FINAL_COPYMasters_Thesis_FINAL_COPY
Masters_Thesis_FINAL_COPY
Ā 
Scc2012 Scala
Scc2012 ScalaScc2012 Scala
Scc2012 Scala
Ā 
IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS
IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETSIMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS
IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS
Ā 
Chemistry development kit
Chemistry development kitChemistry development kit
Chemistry development kit
Ā 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
Ā 
220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization
Ā 

Similar to Software architacture recovery

Binary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programmingBinary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programming
nong_dan
Ā 
Principal of objected oriented programming
Principal of objected oriented programming Principal of objected oriented programming
Principal of objected oriented programming
Rokonuzzaman Rony
Ā 
Aa03101540158
Aa03101540158Aa03101540158
Aa03101540158
ijceronline
Ā 
V2I6_IJERTV2IS60721
V2I6_IJERTV2IS60721V2I6_IJERTV2IS60721
V2I6_IJERTV2IS60721
Ridhika Sharma
Ā 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models
IJECEIAES
Ā 
The Concurrency Challenge : Notes
The Concurrency Challenge : NotesThe Concurrency Challenge : Notes
The Concurrency Challenge : Notes
Subhajit Sahu
Ā 
A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...
butest
Ā 

Similar to Software architacture recovery (20)

Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clustering
Ā 
Binary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programmingBinary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programming
Ā 
Principal of objected oriented programming
Principal of objected oriented programming Principal of objected oriented programming
Principal of objected oriented programming
Ā 
Dot net interview questions and asnwers
Dot net interview questions and asnwersDot net interview questions and asnwers
Dot net interview questions and asnwers
Ā 
Software effort estimation through clustering techniques of RBFN network
Software effort estimation through clustering techniques of RBFN networkSoftware effort estimation through clustering techniques of RBFN network
Software effort estimation through clustering techniques of RBFN network
Ā 
Objective of c in IOS , iOS Live Project Training Ahmedabad, MCA Live Project...
Objective of c in IOS , iOS Live Project Training Ahmedabad, MCA Live Project...Objective of c in IOS , iOS Live Project Training Ahmedabad, MCA Live Project...
Objective of c in IOS , iOS Live Project Training Ahmedabad, MCA Live Project...
Ā 
Aa03101540158
Aa03101540158Aa03101540158
Aa03101540158
Ā 
C04701019027
C04701019027C04701019027
C04701019027
Ā 
V2I6_IJERTV2IS60721
V2I6_IJERTV2IS60721V2I6_IJERTV2IS60721
V2I6_IJERTV2IS60721
Ā 
Automatically translating a general purpose C image processing library for ...
Automatically translating a general purpose C   image processing library for ...Automatically translating a general purpose C   image processing library for ...
Automatically translating a general purpose C image processing library for ...
Ā 
Defaultification Refactoring: A Tool for Automatically Converting Java Method...
Defaultification Refactoring: A Tool for Automatically Converting Java Method...Defaultification Refactoring: A Tool for Automatically Converting Java Method...
Defaultification Refactoring: A Tool for Automatically Converting Java Method...
Ā 
Accelerating A C Image Processing Library With A GPU
Accelerating A C   Image Processing Library With A GPUAccelerating A C   Image Processing Library With A GPU
Accelerating A C Image Processing Library With A GPU
Ā 
indroduction of rain technology
indroduction of rain technologyindroduction of rain technology
indroduction of rain technology
Ā 
A novel approach based on topic
A novel approach based on topicA novel approach based on topic
A novel approach based on topic
Ā 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models
Ā 
The Concurrency Challenge : Notes
The Concurrency Challenge : NotesThe Concurrency Challenge : Notes
The Concurrency Challenge : Notes
Ā 
System Structure for Dependable Software Systems
System Structure for Dependable Software SystemsSystem Structure for Dependable Software Systems
System Structure for Dependable Software Systems
Ā 
Cobe framework cloud ontology blackboard environment for enhancing discovery ...
Cobe framework cloud ontology blackboard environment for enhancing discovery ...Cobe framework cloud ontology blackboard environment for enhancing discovery ...
Cobe framework cloud ontology blackboard environment for enhancing discovery ...
Ā 
Aspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NETAspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NET
Ā 
A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...
Ā 

More from Imdad Ul Haq (7)

How amazon web services uses formal methods
How amazon web services uses formal methodsHow amazon web services uses formal methods
How amazon web services uses formal methods
Ā 
Introduction to Function and there types
Introduction to Function and there typesIntroduction to Function and there types
Introduction to Function and there types
Ā 
Enhance ERD(Entity Relationship Diagram)
Enhance ERD(Entity Relationship Diagram)Enhance ERD(Entity Relationship Diagram)
Enhance ERD(Entity Relationship Diagram)
Ā 
V_Sound(Visualize Softwar and Understand the Document)
V_Sound(Visualize Softwar  and Understand the Document)V_Sound(Visualize Softwar  and Understand the Document)
V_Sound(Visualize Softwar and Understand the Document)
Ā 
Ufone marketing oriantation
Ufone marketing oriantationUfone marketing oriantation
Ufone marketing oriantation
Ā 
Library management system
Library management systemLibrary management system
Library management system
Ā 
Counting sort
Counting sortCounting sort
Counting sort
Ā 

Recently uploaded

Abortion Pill Prices Tembisa [(+27832195400*)] šŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] šŸ„ Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] šŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] šŸ„ Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
Ā 

Recently uploaded (20)

WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
Ā 
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
Ā 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Ā 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
Ā 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
Ā 
WSO2CON 2024 - Freedom Firstā€”Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom Firstā€”Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom Firstā€”Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom Firstā€”Unleashing Developer Potential with Open Source
Ā 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
Ā 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
Ā 
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
Ā 
WSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration ToolingWSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration Tooling
Ā 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
Ā 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
Ā 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in Uganda
Ā 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
Ā 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
Ā 
Abortion Pill Prices Tembisa [(+27832195400*)] šŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] šŸ„ Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] šŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] šŸ„ Women's Abortion Clinic in T...
Ā 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
Ā 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
Ā 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
Ā 
WSO2CON 2024 - Lessons from the Field: Legacy Platforms ā€“ It's Time to Let Go...
WSO2CON 2024 - Lessons from the Field: Legacy Platforms ā€“ It's Time to Let Go...WSO2CON 2024 - Lessons from the Field: Legacy Platforms ā€“ It's Time to Let Go...
WSO2CON 2024 - Lessons from the Field: Legacy Platforms ā€“ It's Time to Let Go...
Ā 

Software architacture recovery

  • 1. Comparing Software Architecture Recovery Using Accurate Dependency Presented by Imdad-Ul-Haq ID# 7661
  • 2. Introduction Software architecture is crucial for program comprehension, programmer communication, and software maintenance. Unfortunately, documented software architectures are either nonexistent or outdated for many software projects. While it is important for developers to document software architecture and keep it up-to-date, it is costly and difļ¬cult. Software grows bigger day by day so itā€™s difficult for developer to have complete knowledge about system. There are many techniques proposed for software architecture recovery.
  • 3. Problem Statement Developer using different techniques for software architecture recovery and they all have some effectiveness. Now to understand there effectiveness we use comparison among these techniques. Chromium is the largest open source software project which contains 10MLOC and order to recover it by experienced recoverer then itā€™s take a lot of time to recover.
  • 4. Methods For Comparison Include Dependency: The include dependencies are inaccurate. For example, ļ¬le foo.c may declare that it includes bar.h, but may not use any functions or variables declared or deļ¬ned in bar.h. Using include dependencies, one would conclude that foo.c depends on bar.h, while foo.c has no actualcode dependency on bar.h. Symbol Dependency: The symbol dependencies are more accurate. A sym-bol can be a function name or a global variable. For example, consider two ļ¬les Alpha.c and Beta.c: ļ¬le Alpha.c contains method A; and ļ¬le Beta.c contains method B. If method A invokes method B, then method A depends on method B. Based on this information, we can conclude that ļ¬le Alpha.c depends on ļ¬le Beta.c.
  • 5. Approaches Our approach consists of two parts: the extraction of symbol dependencies and obtaining a ground-truth architecture. In the second part we use our methods include dependency and symbol dependency. The first one is used in C/C++ and the other one is used in Java. 1) Obtaining dependency for C/C++: To extract symbol dependencies, we use the technique which is compiles a projectā€™s source ļ¬les with LLVM into bit code, analyzes the bit code to extract the symbol dependencies for all symbols inside the project, and groups dependencies based on the ļ¬les containing the symbols. Many header files contain only symbol declaration and to ensure we do not miss the dependency we #include statement in the source code. Include dependencies are transitive dependencies.
  • 6. Approaches 2) Obtaining dependency for Java Projects: To extract symbol dependencies, we leverage a tool that operates at the Java bytecode level and extracts high-level information from the bytecode in a human readable format. We extract include dependencies for Java projects from import statements in Java source code by utilizing a script to determine imports and their associated ļ¬les. The script used to extract the dependencies detects all the ļ¬les in a package. Then for every ļ¬le, it evaluates each import statement and adds the ļ¬les mentioned in the import as a dependency. When a wildcard import is evaluated, all classes in the referred package are added as dependencies. For Java projects, include dependencies miss relationships between ļ¬les because they do not account for intra- package dependencies or fully-qualiļ¬ed name usage.
  • 7. Approaches 3) Obtaining Ground truth architecture: We using the sub module method for obtaining the preliminary ground truth architecture. It consists of three steps. First, we determine the module that each ļ¬le belongs to by analyzing the conļ¬guration ļ¬les of the project. Second, we determine the sub module relationship between modules. We deļ¬ne a sub module as a module that has all of its ļ¬les contained within the subdirectory of another module.
  • 8. Continue We group modules that are sub modules of one another into a cluster. In the example above, we cluster modules A, B and C into a single cluster. This preliminary version of the ground-truth architecture does not accurately reļ¬‚ect the ā€œrealā€ architecture of the project.
  • 9. Techniques We use six algorithms in which four use dependency to determine the clusters while the remaining two techniques use textual information from source code. 1) ACDC (Algorithm for Comprehension-Driven Clustering): ACDC performed well and the main pattern used by it called ā€œsub-graph dominator patternā€. ACDC detects a dominator node n0 and a set of nodes N = {ni | i āˆˆ N } that n0 dominates. ACDC groups the nodes of such a sub-graph together into a cluster. 2) Bunch: The Bunch technique that transforms the architecture recovery problem into an optimization problem. An optimization function called Modularization Quality (MQ) represents the quality of a recovered architecture. Bunch uses hill-climbing and genetic algorithms to ļ¬nd a partition (i.e., a grouping of software entities into clusters) that maximizes MQ.
  • 10. Techniques 3) WCA (Weighted Combined Algorithm): WCA is a hierarchical clustering algorithm that measures the inter-cluster distance between software entities and merges them into clusters based on this distance. Two measures are proposed to measure the inter-cluster distance: Unbiased Ellenberg (UE) and Unbiased Ellenberg-NM (UENM). The main difference between these measures is that UENM integrates more information into the measure and thus might obtain better results. 4) LIMBO: LIMBO is also a hierarchical clustering algorithm that aims to make the Information Bottleneck algorithm scalable for large data sets.
  • 11. Techniques 5) ARC (Architecture Recovery using Concerns): ARC is a hierarchical clustering algorithm that relies on information retrieval and machine learning to perform a recovery. This technique does not use dependencies and is therefore not used to evaluate the inļ¬‚uence of different levels of dependencies. 6) ZBR (Zone Based Recovery): ZBR is a recovery technique based on natural language semantics of identiļ¬ers found in the source code. ZBR demonstrated accuracy in recovering Java package structure but struggled with memory issues when dealing with larger systems .
  • 12. Analyzed Projects We use the 5 large scale and open source projects which included Chromium , Bash , Arch Studio , ITK and Hadoop. For Bash, Hadoop, and Arch Studio, all techniques take a few seconds to a few minutes to run. For large projects such as ITK and Chromium, each technique take several hours or days to run.
  • 13. Accuracy Measures A recovered architecture may be different from a ground-truth architecture, but close to another ground-truth architecture of the same project. We used some matrices to find the accuracy of the above projects by include and symbol dependency methods. 1) TURBOMQ: TURBOMQ is an independent of any ground-truth architecture, which calculates the quality of the recovered architectures. Āµ i is the number of intra-relationships; ɛij + ɛji is the number of inter-relationships between cluster i and cluster j.
  • 14. Accuracy Measures 2) MoJoFM: MojoFM is denoted by the following formula where mno(A, B) is the minimum number of Move or Join operations needed to transform the recovered architecture A into the ground truth B. A score of 100% indicates that the architecture recovered is the same as the ground-truth architecture. 3) Architecture to Architecture: where mto(Ai , A j ) is the minimum number of operations needed to transform architecture Ai into A j ; and aco(Ai ) is the number of operations needed to construct architecture Ai from a ā€œnullā€ architecture Aāˆ….
  • 15. Accuracy Measures 4) Cluster to Cluster: The following formula denotes the cluster algorithm where A1 is the recovered architecture; A2 is a ground-truth architecture; and A2. C are the clusters of A2 . thcvg is a threshold indicating how high the c2c value must be for a techniqueā€™s cluster and a ground-truth cluster in order to count the latter as covered.
  • 19. Conclusion The above experiment shows that symbol dependency gives more accurate result then include dependency. In general, each recovery technique extracted a better quality architecture when using symbol dependencies instead of the less-detailed include dependencies. In general conclusion that quality of input affects quality of output. The results show that not only does each recovery technique produce better output with better input, but also that the highest scoring technique often changes when the input changes.