SlideShare a Scribd company logo
1 of 20
Comparing Software Architecture Recovery
Using Accurate Dependency
Presented by
Imdad-Ul-Haq
ID# 7661
Introduction
Software architecture is crucial for program comprehension,
programmer communication, and software maintenance. Unfortunately,
documented software architectures are either nonexistent or outdated
for many software projects. While it is important for developers to
document software architecture and keep it up-to-date, it is costly and
difficult. Software grows bigger day by day so it’s difficult for developer
to have complete knowledge about system. There are many techniques
proposed for software architecture recovery.
Problem Statement
Developer using different techniques for software architecture recovery
and they all have some effectiveness. Now to understand there
effectiveness we use comparison among these techniques. Chromium is
the largest open source software project which contains 10MLOC and
order to recover it by experienced recoverer then it’s take a lot of time
to recover.
Methods For Comparison
Include Dependency:
The include dependencies are inaccurate. For example, file foo.c may declare that it includes
bar.h, but may not use any functions or variables declared or defined in bar.h. Using include
dependencies, one would conclude that foo.c depends on bar.h, while foo.c has no actualcode
dependency on bar.h.
Symbol Dependency:
The symbol dependencies are more accurate. A sym-bol can be a function name or a global
variable. For example, consider two files Alpha.c and Beta.c: file Alpha.c
contains method A; and file Beta.c contains method B. If method A invokes method B, then
method A depends on method B. Based on this information, we can conclude that file Alpha.c
depends on file Beta.c.
Approaches
Our approach consists of two parts: the extraction of symbol dependencies and obtaining a
ground-truth architecture. In the second part we use our methods include dependency and
symbol dependency. The first one is used in C/C++ and the other one is used in Java.
1) Obtaining dependency for C/C++:
To extract symbol dependencies, we use the technique which is compiles a project’s source
files with LLVM into bit code, analyzes the bit code to extract the symbol dependencies for all
symbols inside the project, and groups dependencies based on the files containing the symbols.
Many header files contain only symbol declaration and to ensure we do not miss the
dependency we #include statement in the source code. Include dependencies are transitive
dependencies.
Approaches
2) Obtaining dependency for Java Projects:
To extract symbol dependencies, we leverage a tool that operates at the Java bytecode level
and extracts high-level information from the bytecode in a human readable format. We extract
include dependencies for Java projects from import statements in Java source code by utilizing a
script to determine imports and their associated files. The script used to extract the
dependencies detects all the files in a package. Then for every file, it evaluates each import
statement and adds the files mentioned in the import as a dependency. When a wildcard import
is evaluated, all classes in the referred package are added as dependencies. For Java projects,
include dependencies miss relationships between files because they do not account for intra-
package dependencies or fully-qualified name usage.
Approaches
3) Obtaining Ground truth architecture:
We using the sub module method for obtaining the preliminary ground truth architecture.
It consists of three steps. First, we determine the module that each file belongs to by analyzing
the configuration files of the project. Second, we determine the sub module relationship
between modules. We define a sub module as a module that has all of its files contained within
the subdirectory of another module.
Continue
We group modules that are sub modules of one another into a cluster. In the example above,
we cluster modules A, B and C into a single cluster. This preliminary version of the ground-truth
architecture does not accurately reflect the “real” architecture of the project.
Techniques
We use six algorithms in which four use dependency to determine the clusters while the
remaining two techniques use textual information from source code.
1) ACDC (Algorithm for Comprehension-Driven Clustering):
ACDC performed well and the main pattern used by it called “sub-graph dominator
pattern”. ACDC detects a dominator node n0 and a set of nodes N = {ni | i ∈ N } that n0
dominates. ACDC groups the nodes of such a sub-graph together into a cluster.
2) Bunch:
The Bunch technique that transforms the architecture recovery problem into an optimization
problem. An optimization function called Modularization Quality (MQ) represents the quality of
a recovered architecture. Bunch uses hill-climbing and genetic algorithms to find a partition (i.e.,
a grouping of software entities into clusters) that maximizes MQ.
Techniques
3) WCA (Weighted Combined Algorithm):
WCA is a hierarchical clustering algorithm that measures the inter-cluster distance between
software entities and merges them into clusters based on this distance. Two measures are
proposed to measure the inter-cluster distance: Unbiased Ellenberg (UE) and Unbiased
Ellenberg-NM (UENM). The main difference between these measures is that UENM integrates
more information into the measure and thus might obtain better results.
4) LIMBO:
LIMBO is also a hierarchical clustering algorithm that aims to make the Information Bottleneck
algorithm scalable for large data sets.
Techniques
5) ARC (Architecture Recovery using Concerns):
ARC is a hierarchical clustering algorithm that relies on information retrieval and machine
learning to perform a recovery. This technique does not use dependencies and is therefore not
used to evaluate the influence of different levels of dependencies.
6) ZBR (Zone Based Recovery):
ZBR is a recovery technique based on natural language semantics of identifiers found in the
source code. ZBR demonstrated accuracy in recovering Java package structure but struggled
with memory issues when dealing with larger systems .
Analyzed Projects
We use the 5 large scale and open source projects which included Chromium , Bash , Arch
Studio , ITK and Hadoop. For Bash, Hadoop, and Arch Studio, all techniques take a few seconds
to a few minutes to run. For large projects such as ITK and Chromium, each technique take
several hours or days to run.
Accuracy Measures
A recovered architecture may be different from a ground-truth architecture, but close to
another ground-truth architecture of the same project. We used some matrices to find the
accuracy of the above projects by include and symbol dependency methods.
1) TURBOMQ:
TURBOMQ is an independent of any ground-truth architecture, which calculates the quality
of the recovered architectures.
µ i is the number of intra-relationships; ɛij + ɛji is the number of inter-relationships between
cluster i and cluster j.
Accuracy Measures
2) MoJoFM:
MojoFM is denoted by the following formula
where mno(A, B) is the minimum number of Move or Join operations needed to transform the
recovered architecture A into the ground truth B. A score of 100% indicates that the architecture
recovered is the same as the ground-truth architecture.
3) Architecture to Architecture:
where mto(Ai , A j ) is the minimum number of operations needed to transform architecture Ai into A j ; and aco(Ai ) is
the number of operations needed to construct architecture Ai from a “null” architecture A∅.
Accuracy Measures
4) Cluster to Cluster:
The following formula denotes the cluster algorithm where
A1 is the recovered architecture; A2 is a ground-truth architecture; and A2. C are the clusters of
A2 . thcvg is a threshold indicating how high the c2c value must be for a technique’s cluster and a
ground-truth cluster in order to count the latter as covered.
Results
Results
Results
Conclusion
The above experiment shows that symbol dependency gives more accurate result then include
dependency. In general, each recovery technique extracted a better quality architecture when
using symbol dependencies instead of the less-detailed include dependencies. In general
conclusion that quality of input affects quality of output. The results show that not only does
each recovery technique produce better output with better input, but also that the highest
scoring technique often changes when the input changes.
THANK YOU

More Related Content

What's hot

FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...
FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...
FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...VLSICS Design
 
ERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskICSM 2011
 
Collaborative archietyped for ipv4
Collaborative archietyped for ipv4Collaborative archietyped for ipv4
Collaborative archietyped for ipv4Fredrick Ishengoma
 
Exploring Models of Computation through Static Analysis
Exploring Models of Computation through Static AnalysisExploring Models of Computation through Static Analysis
Exploring Models of Computation through Static Analysisijeukens
 
Novel high functionality fault tolerant ALU
Novel high functionality fault tolerant ALUNovel high functionality fault tolerant ALU
Novel high functionality fault tolerant ALUTELKOMNIKA JOURNAL
 
Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...
Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...
Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...CSCJournals
 
Belief Propagation Decoder for LDPC Codes Based on VLSI Implementation
Belief Propagation Decoder for LDPC Codes Based on VLSI ImplementationBelief Propagation Decoder for LDPC Codes Based on VLSI Implementation
Belief Propagation Decoder for LDPC Codes Based on VLSI Implementationinventionjournals
 
Automatically Generated Simulations for Predicting Software-Defined Networkin...
Automatically Generated Simulations for Predicting Software-Defined Networkin...Automatically Generated Simulations for Predicting Software-Defined Networkin...
Automatically Generated Simulations for Predicting Software-Defined Networkin...Felipe Alencar
 
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...Raffi Khatchadourian
 
Reliability Improvement in Logic Circuit Stochastic Computation
Reliability Improvement in Logic Circuit Stochastic ComputationReliability Improvement in Logic Circuit Stochastic Computation
Reliability Improvement in Logic Circuit Stochastic ComputationWaqas Tariq
 
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHMAN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHMIJCSEA Journal
 
Scc2012 Scala
Scc2012 ScalaScc2012 Scala
Scc2012 Scalasteccami
 
IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS
IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETSIMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS
IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETSIJCNCJournal
 
Chemistry development kit
Chemistry development kitChemistry development kit
Chemistry development kitAlichy Sowmya
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureMasud Rahman
 
220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualizationtaeseon ryu
 

What's hot (19)

FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...
FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...
FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...
 
ERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to Task
 
Collaborative archietyped for ipv4
Collaborative archietyped for ipv4Collaborative archietyped for ipv4
Collaborative archietyped for ipv4
 
Exploring Models of Computation through Static Analysis
Exploring Models of Computation through Static AnalysisExploring Models of Computation through Static Analysis
Exploring Models of Computation through Static Analysis
 
Novel high functionality fault tolerant ALU
Novel high functionality fault tolerant ALUNovel high functionality fault tolerant ALU
Novel high functionality fault tolerant ALU
 
Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...
Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...
Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...
 
Belief Propagation Decoder for LDPC Codes Based on VLSI Implementation
Belief Propagation Decoder for LDPC Codes Based on VLSI ImplementationBelief Propagation Decoder for LDPC Codes Based on VLSI Implementation
Belief Propagation Decoder for LDPC Codes Based on VLSI Implementation
 
Automatically Generated Simulations for Predicting Software-Defined Networkin...
Automatically Generated Simulations for Predicting Software-Defined Networkin...Automatically Generated Simulations for Predicting Software-Defined Networkin...
Automatically Generated Simulations for Predicting Software-Defined Networkin...
 
A57040102
A57040102A57040102
A57040102
 
IT6511 Networks Laboratory
IT6511 Networks LaboratoryIT6511 Networks Laboratory
IT6511 Networks Laboratory
 
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
 
Reliability Improvement in Logic Circuit Stochastic Computation
Reliability Improvement in Logic Circuit Stochastic ComputationReliability Improvement in Logic Circuit Stochastic Computation
Reliability Improvement in Logic Circuit Stochastic Computation
 
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHMAN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
 
Masters_Thesis_FINAL_COPY
Masters_Thesis_FINAL_COPYMasters_Thesis_FINAL_COPY
Masters_Thesis_FINAL_COPY
 
Scc2012 Scala
Scc2012 ScalaScc2012 Scala
Scc2012 Scala
 
IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS
IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETSIMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS
IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS
 
Chemistry development kit
Chemistry development kitChemistry development kit
Chemistry development kit
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization
 

Similar to Software architacture recovery

Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clusteringNishanth Harapanahalli
 
Binary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programmingBinary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programmingnong_dan
 
Principal of objected oriented programming
Principal of objected oriented programming Principal of objected oriented programming
Principal of objected oriented programming Rokonuzzaman Rony
 
Dot net interview questions and asnwers
Dot net interview questions and asnwersDot net interview questions and asnwers
Dot net interview questions and asnwerskavinilavuG
 
Software effort estimation through clustering techniques of RBFN network
Software effort estimation through clustering techniques of RBFN networkSoftware effort estimation through clustering techniques of RBFN network
Software effort estimation through clustering techniques of RBFN networkIOSR Journals
 
Objective of c in IOS , iOS Live Project Training Ahmedabad, MCA Live Project...
Objective of c in IOS , iOS Live Project Training Ahmedabad, MCA Live Project...Objective of c in IOS , iOS Live Project Training Ahmedabad, MCA Live Project...
Objective of c in IOS , iOS Live Project Training Ahmedabad, MCA Live Project...NicheTech Com. Solutions Pvt. Ltd.
 
Automatically translating a general purpose C image processing library for ...
Automatically translating a general purpose C   image processing library for ...Automatically translating a general purpose C   image processing library for ...
Automatically translating a general purpose C image processing library for ...Carrie Cox
 
Defaultification Refactoring: A Tool for Automatically Converting Java Method...
Defaultification Refactoring: A Tool for Automatically Converting Java Method...Defaultification Refactoring: A Tool for Automatically Converting Java Method...
Defaultification Refactoring: A Tool for Automatically Converting Java Method...Raffi Khatchadourian
 
Accelerating A C Image Processing Library With A GPU
Accelerating A C   Image Processing Library With A GPUAccelerating A C   Image Processing Library With A GPU
Accelerating A C Image Processing Library With A GPUDaniel Wachtel
 
indroduction of rain technology
indroduction of rain technologyindroduction of rain technology
indroduction of rain technologynarayan dudhe
 
A novel approach based on topic
A novel approach based on topicA novel approach based on topic
A novel approach based on topiccsandit
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models IJECEIAES
 
The Concurrency Challenge : Notes
The Concurrency Challenge : NotesThe Concurrency Challenge : Notes
The Concurrency Challenge : NotesSubhajit Sahu
 
System Structure for Dependable Software Systems
System Structure for Dependable Software SystemsSystem Structure for Dependable Software Systems
System Structure for Dependable Software SystemsVincenzo De Florio
 
Cobe framework cloud ontology blackboard environment for enhancing discovery ...
Cobe framework cloud ontology blackboard environment for enhancing discovery ...Cobe framework cloud ontology blackboard environment for enhancing discovery ...
Cobe framework cloud ontology blackboard environment for enhancing discovery ...ijccsa
 
Aspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NETAspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NETWaqas Tariq
 
A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...butest
 

Similar to Software architacture recovery (20)

Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clustering
 
Binary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programmingBinary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programming
 
Principal of objected oriented programming
Principal of objected oriented programming Principal of objected oriented programming
Principal of objected oriented programming
 
Dot net interview questions and asnwers
Dot net interview questions and asnwersDot net interview questions and asnwers
Dot net interview questions and asnwers
 
Software effort estimation through clustering techniques of RBFN network
Software effort estimation through clustering techniques of RBFN networkSoftware effort estimation through clustering techniques of RBFN network
Software effort estimation through clustering techniques of RBFN network
 
Objective of c in IOS , iOS Live Project Training Ahmedabad, MCA Live Project...
Objective of c in IOS , iOS Live Project Training Ahmedabad, MCA Live Project...Objective of c in IOS , iOS Live Project Training Ahmedabad, MCA Live Project...
Objective of c in IOS , iOS Live Project Training Ahmedabad, MCA Live Project...
 
Aa03101540158
Aa03101540158Aa03101540158
Aa03101540158
 
C04701019027
C04701019027C04701019027
C04701019027
 
V2I6_IJERTV2IS60721
V2I6_IJERTV2IS60721V2I6_IJERTV2IS60721
V2I6_IJERTV2IS60721
 
Automatically translating a general purpose C image processing library for ...
Automatically translating a general purpose C   image processing library for ...Automatically translating a general purpose C   image processing library for ...
Automatically translating a general purpose C image processing library for ...
 
Defaultification Refactoring: A Tool for Automatically Converting Java Method...
Defaultification Refactoring: A Tool for Automatically Converting Java Method...Defaultification Refactoring: A Tool for Automatically Converting Java Method...
Defaultification Refactoring: A Tool for Automatically Converting Java Method...
 
Accelerating A C Image Processing Library With A GPU
Accelerating A C   Image Processing Library With A GPUAccelerating A C   Image Processing Library With A GPU
Accelerating A C Image Processing Library With A GPU
 
indroduction of rain technology
indroduction of rain technologyindroduction of rain technology
indroduction of rain technology
 
A novel approach based on topic
A novel approach based on topicA novel approach based on topic
A novel approach based on topic
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models
 
The Concurrency Challenge : Notes
The Concurrency Challenge : NotesThe Concurrency Challenge : Notes
The Concurrency Challenge : Notes
 
System Structure for Dependable Software Systems
System Structure for Dependable Software SystemsSystem Structure for Dependable Software Systems
System Structure for Dependable Software Systems
 
Cobe framework cloud ontology blackboard environment for enhancing discovery ...
Cobe framework cloud ontology blackboard environment for enhancing discovery ...Cobe framework cloud ontology blackboard environment for enhancing discovery ...
Cobe framework cloud ontology blackboard environment for enhancing discovery ...
 
Aspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NETAspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NET
 
A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...
 

More from Imdad Ul Haq

How amazon web services uses formal methods
How amazon web services uses formal methodsHow amazon web services uses formal methods
How amazon web services uses formal methodsImdad Ul Haq
 
Introduction to Function and there types
Introduction to Function and there typesIntroduction to Function and there types
Introduction to Function and there typesImdad Ul Haq
 
Enhance ERD(Entity Relationship Diagram)
Enhance ERD(Entity Relationship Diagram)Enhance ERD(Entity Relationship Diagram)
Enhance ERD(Entity Relationship Diagram)Imdad Ul Haq
 
V_Sound(Visualize Softwar and Understand the Document)
V_Sound(Visualize Softwar  and Understand the Document)V_Sound(Visualize Softwar  and Understand the Document)
V_Sound(Visualize Softwar and Understand the Document)Imdad Ul Haq
 
Ufone marketing oriantation
Ufone marketing oriantationUfone marketing oriantation
Ufone marketing oriantationImdad Ul Haq
 
Library management system
Library management systemLibrary management system
Library management systemImdad Ul Haq
 

More from Imdad Ul Haq (7)

How amazon web services uses formal methods
How amazon web services uses formal methodsHow amazon web services uses formal methods
How amazon web services uses formal methods
 
Introduction to Function and there types
Introduction to Function and there typesIntroduction to Function and there types
Introduction to Function and there types
 
Enhance ERD(Entity Relationship Diagram)
Enhance ERD(Entity Relationship Diagram)Enhance ERD(Entity Relationship Diagram)
Enhance ERD(Entity Relationship Diagram)
 
V_Sound(Visualize Softwar and Understand the Document)
V_Sound(Visualize Softwar  and Understand the Document)V_Sound(Visualize Softwar  and Understand the Document)
V_Sound(Visualize Softwar and Understand the Document)
 
Ufone marketing oriantation
Ufone marketing oriantationUfone marketing oriantation
Ufone marketing oriantation
 
Library management system
Library management systemLibrary management system
Library management system
 
Counting sort
Counting sortCounting sort
Counting sort
 

Recently uploaded

Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 

Recently uploaded (20)

Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 

Software architacture recovery

  • 1. Comparing Software Architecture Recovery Using Accurate Dependency Presented by Imdad-Ul-Haq ID# 7661
  • 2. Introduction Software architecture is crucial for program comprehension, programmer communication, and software maintenance. Unfortunately, documented software architectures are either nonexistent or outdated for many software projects. While it is important for developers to document software architecture and keep it up-to-date, it is costly and difficult. Software grows bigger day by day so it’s difficult for developer to have complete knowledge about system. There are many techniques proposed for software architecture recovery.
  • 3. Problem Statement Developer using different techniques for software architecture recovery and they all have some effectiveness. Now to understand there effectiveness we use comparison among these techniques. Chromium is the largest open source software project which contains 10MLOC and order to recover it by experienced recoverer then it’s take a lot of time to recover.
  • 4. Methods For Comparison Include Dependency: The include dependencies are inaccurate. For example, file foo.c may declare that it includes bar.h, but may not use any functions or variables declared or defined in bar.h. Using include dependencies, one would conclude that foo.c depends on bar.h, while foo.c has no actualcode dependency on bar.h. Symbol Dependency: The symbol dependencies are more accurate. A sym-bol can be a function name or a global variable. For example, consider two files Alpha.c and Beta.c: file Alpha.c contains method A; and file Beta.c contains method B. If method A invokes method B, then method A depends on method B. Based on this information, we can conclude that file Alpha.c depends on file Beta.c.
  • 5. Approaches Our approach consists of two parts: the extraction of symbol dependencies and obtaining a ground-truth architecture. In the second part we use our methods include dependency and symbol dependency. The first one is used in C/C++ and the other one is used in Java. 1) Obtaining dependency for C/C++: To extract symbol dependencies, we use the technique which is compiles a project’s source files with LLVM into bit code, analyzes the bit code to extract the symbol dependencies for all symbols inside the project, and groups dependencies based on the files containing the symbols. Many header files contain only symbol declaration and to ensure we do not miss the dependency we #include statement in the source code. Include dependencies are transitive dependencies.
  • 6. Approaches 2) Obtaining dependency for Java Projects: To extract symbol dependencies, we leverage a tool that operates at the Java bytecode level and extracts high-level information from the bytecode in a human readable format. We extract include dependencies for Java projects from import statements in Java source code by utilizing a script to determine imports and their associated files. The script used to extract the dependencies detects all the files in a package. Then for every file, it evaluates each import statement and adds the files mentioned in the import as a dependency. When a wildcard import is evaluated, all classes in the referred package are added as dependencies. For Java projects, include dependencies miss relationships between files because they do not account for intra- package dependencies or fully-qualified name usage.
  • 7. Approaches 3) Obtaining Ground truth architecture: We using the sub module method for obtaining the preliminary ground truth architecture. It consists of three steps. First, we determine the module that each file belongs to by analyzing the configuration files of the project. Second, we determine the sub module relationship between modules. We define a sub module as a module that has all of its files contained within the subdirectory of another module.
  • 8. Continue We group modules that are sub modules of one another into a cluster. In the example above, we cluster modules A, B and C into a single cluster. This preliminary version of the ground-truth architecture does not accurately reflect the “real” architecture of the project.
  • 9. Techniques We use six algorithms in which four use dependency to determine the clusters while the remaining two techniques use textual information from source code. 1) ACDC (Algorithm for Comprehension-Driven Clustering): ACDC performed well and the main pattern used by it called “sub-graph dominator pattern”. ACDC detects a dominator node n0 and a set of nodes N = {ni | i ∈ N } that n0 dominates. ACDC groups the nodes of such a sub-graph together into a cluster. 2) Bunch: The Bunch technique that transforms the architecture recovery problem into an optimization problem. An optimization function called Modularization Quality (MQ) represents the quality of a recovered architecture. Bunch uses hill-climbing and genetic algorithms to find a partition (i.e., a grouping of software entities into clusters) that maximizes MQ.
  • 10. Techniques 3) WCA (Weighted Combined Algorithm): WCA is a hierarchical clustering algorithm that measures the inter-cluster distance between software entities and merges them into clusters based on this distance. Two measures are proposed to measure the inter-cluster distance: Unbiased Ellenberg (UE) and Unbiased Ellenberg-NM (UENM). The main difference between these measures is that UENM integrates more information into the measure and thus might obtain better results. 4) LIMBO: LIMBO is also a hierarchical clustering algorithm that aims to make the Information Bottleneck algorithm scalable for large data sets.
  • 11. Techniques 5) ARC (Architecture Recovery using Concerns): ARC is a hierarchical clustering algorithm that relies on information retrieval and machine learning to perform a recovery. This technique does not use dependencies and is therefore not used to evaluate the influence of different levels of dependencies. 6) ZBR (Zone Based Recovery): ZBR is a recovery technique based on natural language semantics of identifiers found in the source code. ZBR demonstrated accuracy in recovering Java package structure but struggled with memory issues when dealing with larger systems .
  • 12. Analyzed Projects We use the 5 large scale and open source projects which included Chromium , Bash , Arch Studio , ITK and Hadoop. For Bash, Hadoop, and Arch Studio, all techniques take a few seconds to a few minutes to run. For large projects such as ITK and Chromium, each technique take several hours or days to run.
  • 13. Accuracy Measures A recovered architecture may be different from a ground-truth architecture, but close to another ground-truth architecture of the same project. We used some matrices to find the accuracy of the above projects by include and symbol dependency methods. 1) TURBOMQ: TURBOMQ is an independent of any ground-truth architecture, which calculates the quality of the recovered architectures. µ i is the number of intra-relationships; ɛij + ɛji is the number of inter-relationships between cluster i and cluster j.
  • 14. Accuracy Measures 2) MoJoFM: MojoFM is denoted by the following formula where mno(A, B) is the minimum number of Move or Join operations needed to transform the recovered architecture A into the ground truth B. A score of 100% indicates that the architecture recovered is the same as the ground-truth architecture. 3) Architecture to Architecture: where mto(Ai , A j ) is the minimum number of operations needed to transform architecture Ai into A j ; and aco(Ai ) is the number of operations needed to construct architecture Ai from a “null” architecture A∅.
  • 15. Accuracy Measures 4) Cluster to Cluster: The following formula denotes the cluster algorithm where A1 is the recovered architecture; A2 is a ground-truth architecture; and A2. C are the clusters of A2 . thcvg is a threshold indicating how high the c2c value must be for a technique’s cluster and a ground-truth cluster in order to count the latter as covered.
  • 19. Conclusion The above experiment shows that symbol dependency gives more accurate result then include dependency. In general, each recovery technique extracted a better quality architecture when using symbol dependencies instead of the less-detailed include dependencies. In general conclusion that quality of input affects quality of output. The results show that not only does each recovery technique produce better output with better input, but also that the highest scoring technique often changes when the input changes.