On the Relevance of Code Anomalies for
Identifying Architecture Degradation Symptoms

               Isela Macía1, Roberta Arcoverde1, Alessandro Garcia1,
                         Christina Chavez2, Arndt von Staa1

       1Pontifical
                 Catholic University of Rio de Janeiro – PUC-Rio, Brazil
       2Federal University of Bahia – UFBA, Brazil




LES | DI |PUC-Rio - Brazil                                                 OPUS Group
Code Anomalies



                    Code Anomaly



      “A code smell is a surface indication
     that usually corresponds to a deeper
                   problem in the system.”

                       Martin Fowler, 1999




                         Roberta @ OPUS Group   2
Architectural Anomalies


                  ModuleA
         <<subsystem>>

          Concern A1
          Concern A2
          Concern C1
                                                      ModuleC
                                              <<subsystem>>

                  ModuleB                 ConcernC
         <<subsystem>>
         Concern B1
         ConcernB2
         ConcernC2
                                              Scattered Functionality



                       Roberta @ OPUS Group                             3
Relevance of Code Anomalies
                                                                     GUI
 public class HWFacade{                                     <<subsystem>>

     public void updateComplaint(..){..}                    <<subsystem>>
     public Complaint searchComplaint(..){..}
     public void insertComplaint(..){..}

     public void insertEmployee(..){..}                        Employee
     public Employee searchEmployee(..){..}          Symptom
     public void updateEmployee(..){..}                             Complaint

     public void insertSymptom(..){..}             HWFacade
                                                                    Business
     public Symptom searchSymptom(..){..}
     public void updateSymptom(..){..}
     ...                                                <<subsystem>>
 }




                                     Roberta @ OPUS Group                       4
Relevance of Code Anomalies

 public class ComplaintRepo{
   ...
   public int insert(..){..}
   public void update(..){..}                                              DATA
   public int getIndex(..){..}
                                                               <<subsystem>>
                                                                <<subsystem>>
   public boolean exists(..){..}
                                                                   EmployeeArray
   public Complaint search(..){..}
                                                           ComplaintRepo
   public void reset(..){..}                                              Repository
   public Object next(..){..}                             ArrayRepository
                                                                          Factory
   public void remove(..){..}
   public List getList(..){..}
   public boolean hasNext(..){..}
   public void updateTimestamp(..){..}
   public int searchTimestamp(..){..}
    ...
 }




                                   Roberta @ OPUS Group                                5
Previous Research on Code Anomaly Impact

                     Khomh et al. WCRE „09

                    Anomalous code elements tend to
                    be changed more frequently than
                    free-anomalous elements




               Roberta @ OPUS Group                   6
Previous Research on Code Anomaly Impact

                     D‟Ambros et al. QSIC „10

                     There are no code anomalies that
                     can be considered more harmful with
                     respect to software defects




               Roberta @ OPUS Group                        7
Previous Research on Code Anomaly Impact

                     D‟Ambros et al. QSIC „10

                     There are no code anomalies that
                     can be considered more harmful with
                     respect to software defects




               Roberta @ OPUS Group                        8
Architecture Degradation
   Major software engineering problem
       Might unable systems evolution
   Early identification could help avoiding it
       But can it be identified from code anomalies?




                        Roberta @ OPUS Group            9
Three Questions

       Are anomalous code elements related to
 1     architecture problems?

       If so, which characteristics of the code
 2     anomaly are relevant for the architecture
       design?

       To what extent the applied refactorings
 3     actually addressed architecturally-
       relevant code anomalies?
               Roberta @ OPUS Group                10
Target Systems
       MIDAS            MM                           HW           PDP

        C++         Java/AspectJ              Java/AspectJ         C#

      76 KLOC         54 KLOC                       49 KLOC     22 KLOC

    111 anomalies   170 anomalies            252 anomalies    175 anomalies


    6 different systems
    40 revisions
    Architecture information available

                             Roberta @ OPUS Group                             11
Study Phases
1.   Data Collection
2.   Analysis of code anomalies impact on
     identified architecture problems
3.   Refactoring extraction
4.   Analysis of refactoring on identified
     architecture problems




                     Roberta @ OPUS Group    12
Data Collection

Recovering Actual Architecture


Identifying Architecture Problems


Detecting Code Anomalies


Analyzing the Impact of Code Anomalies                DATA
                                                             BUSINESS   GUI
                                                     DATA




                                    Roberta @ OPUS Group                      13
Data Collection

Recovering Actual Architecture


Identifying Architecture Problems


Detecting Code Anomalies


Analyzing the Impact of Code Anomalies

                                                    DATA   BUSINESS   GUI




                                    Roberta @ OPUS Group                    14
Data Collection

Recovering Actual Architecture


Identifying Architecture Problems


Detecting Code Anomalies


Analyzing the Impact of Code Anomalies


                                                    DATA
                                                    DATA   BUSINESS   GUI




                                    Roberta @ OPUS Group                    15
Analysis

Recovering Actual Architecture


Identifying Architecture Problems


Detecting Code Anomalies


Analyzing the Impact of Code Anomalies




                                                    DATA
                                                    DATA   BUSINESS   GUI




                                    Roberta @ OPUS Group                    16
Analyzing the Impact of Code Anomalies I
   Null hypothesis: There is no relation between
    code anomalies and architecture problems




                    Roberta @ OPUS Group            17
Analyzing the Impact of Code Anomalies I
   Null hypothesis: There is no relation between
    code anomalies and architecture problems
   Fisher’s exact test




                    Roberta @ OPUS Group            18
Analyzing the Impact of Code Anomalies I
Code anomalies and architecture problems were
  related in

                77,5%
 of the analyzed versions




                 Roberta @ OPUS Group           19
Analyzing the Impact of Code Anomalies II
   Downstream Analysis
       Which architecture problems were caused by code
        anomalies




                                    DATA
                                    DATA      BUSINESS   GUI




                       Roberta @ OPUS Group                    20
Analyzing the Impact of Code Anomalies II
     Downstream Analysis
100
 90
 80
 70
 60                                                 Not Caused by Code
 50                                                 Anomalies
 40                                                 Caused by Code
 30                                                 Anomalies
 20
 10
  0
         HW     MM        PDP               MIDAS

                     Roberta @ OPUS Group                                21
Analyzing the Impact of Code Anomalies II
   Upstream Analysis
       Which code anomalies caused architecture
        problems




                                    DATA
                                    DATA      BUSINESS   GUI




                       Roberta @ OPUS Group                    22
Analyzing the Impact of Code Anomalies II
   Upstream Analysis
      100
       90
       80
       70
       60
       50                                         Irrelevant
       40                                         Relevant
       30
       20
       10
        0
            HW    MM              PDP     MIDAS


                   Roberta @ OPUS Group                        23
Analyzing the Impact of Code Anomalies II
   Upstream Analysis
      100
       90
       80
       70
       60
       50                                         Irrelevant
       40                                         Relevant
       30
       20
       10
        0
            HW    MM              PDP     MIDAS


                   Roberta @ OPUS Group                        24
Identifying Relevant Code Anomalies
   Code anomalies were divided by
       Type of code anomaly
       Earliness of anomaly




                       Roberta @ OPUS Group   25
Type of Code Anomaly




# of releases where each type of anomaly was
significant (causing architecture problems)

                  Roberta @ OPUS Group         26
Type of Code Anomaly




# of releases where each type of anomaly was
statistically significant (causing architecture problems)

                    Roberta @ OPUS Group                    27
Earliness of Anomaly
   Early anomaly: appears in the 1st version of
    each system


    18%
Of all architecturally-relevant
code anomalies were identified
as
    early anomalies



                                  Roberta @ OPUS Group   28
Earliness of Anomaly
   Early anomaly: appears in the 1st version of
    each system

                                            and were related to more than

    18%                                              37%
Of all architecturally-relevant
code anomalies were identified                of all architecture problems
as
    early anomalies



                                  Roberta @ OPUS Group                       29
Refactoring of Relevant Anomalies
   We wanted to analyze whether architecturally-
    relevant anomalies were often refactored
   Detecting refactorings from source code history
       Commit messages
       Source code diffs (manually inspected)
   Checking whether the refactored anomaly was
    architecturally-relevant



                        Roberta @ OPUS Group          30
Refactoring of Relevant Anomalies
   658 refactorings
       33% high-level
            Move member (16%)
            Extract class or superclass (12%)
       67% low-level
            Rename (32%)
            Extract local variable (16%)
   37% of all architecture-relevant anomalies were
    refactored
   Isolated versions concentrated most of the refactoring
    efforts
                                 Roberta @ OPUS Group        31
Concluding Remarks




             Roberta @ OPUS Group   32
Concluding Remarks




             Roberta @ OPUS Group   33
Concluding Remarks




Architecturally-relevant anomalies
are not frequently refactored




                                     Roberta @ OPUS Group   34
Concluding Remarks




Architecturally-relevant anomalies
are not frequently refactored




                                     Roberta @ OPUS Group   35
Thank you




                ?
            Roberta @ OPUS Group   36
Cause-Effect Criteria


1      Recurrently inferred in all systems versions

2      Observed in different modules of the same system

3       Modules involved the contribution of different developers




   Isela Macia et al – AOSD 2011: An Exploratory Study of Code Smells in Aspect-Oriented Systems.


   Isela Macia et al – AOSD 2012: Are Automatically-detected Code Anomalies relevant to
         Architectural Modularity?

                                    Roberta @ OPUS Group                                         37

On the Relevance of Code Anomalies for Identifying Architecture Degradation Symptoms

  • 1.
    On the Relevanceof Code Anomalies for Identifying Architecture Degradation Symptoms Isela Macía1, Roberta Arcoverde1, Alessandro Garcia1, Christina Chavez2, Arndt von Staa1 1Pontifical Catholic University of Rio de Janeiro – PUC-Rio, Brazil 2Federal University of Bahia – UFBA, Brazil LES | DI |PUC-Rio - Brazil OPUS Group
  • 2.
    Code Anomalies Code Anomaly “A code smell is a surface indication that usually corresponds to a deeper problem in the system.” Martin Fowler, 1999 Roberta @ OPUS Group 2
  • 3.
    Architectural Anomalies ModuleA <<subsystem>> Concern A1 Concern A2 Concern C1 ModuleC <<subsystem>> ModuleB ConcernC <<subsystem>> Concern B1 ConcernB2 ConcernC2 Scattered Functionality Roberta @ OPUS Group 3
  • 4.
    Relevance of CodeAnomalies GUI public class HWFacade{ <<subsystem>> public void updateComplaint(..){..} <<subsystem>> public Complaint searchComplaint(..){..} public void insertComplaint(..){..} public void insertEmployee(..){..} Employee public Employee searchEmployee(..){..} Symptom public void updateEmployee(..){..} Complaint public void insertSymptom(..){..} HWFacade Business public Symptom searchSymptom(..){..} public void updateSymptom(..){..} ... <<subsystem>> } Roberta @ OPUS Group 4
  • 5.
    Relevance of CodeAnomalies public class ComplaintRepo{ ... public int insert(..){..} public void update(..){..} DATA public int getIndex(..){..} <<subsystem>> <<subsystem>> public boolean exists(..){..} EmployeeArray public Complaint search(..){..} ComplaintRepo public void reset(..){..} Repository public Object next(..){..} ArrayRepository Factory public void remove(..){..} public List getList(..){..} public boolean hasNext(..){..} public void updateTimestamp(..){..} public int searchTimestamp(..){..} ... } Roberta @ OPUS Group 5
  • 6.
    Previous Research onCode Anomaly Impact Khomh et al. WCRE „09 Anomalous code elements tend to be changed more frequently than free-anomalous elements Roberta @ OPUS Group 6
  • 7.
    Previous Research onCode Anomaly Impact D‟Ambros et al. QSIC „10 There are no code anomalies that can be considered more harmful with respect to software defects Roberta @ OPUS Group 7
  • 8.
    Previous Research onCode Anomaly Impact D‟Ambros et al. QSIC „10 There are no code anomalies that can be considered more harmful with respect to software defects Roberta @ OPUS Group 8
  • 9.
    Architecture Degradation  Major software engineering problem  Might unable systems evolution  Early identification could help avoiding it  But can it be identified from code anomalies? Roberta @ OPUS Group 9
  • 10.
    Three Questions Are anomalous code elements related to 1 architecture problems? If so, which characteristics of the code 2 anomaly are relevant for the architecture design? To what extent the applied refactorings 3 actually addressed architecturally- relevant code anomalies? Roberta @ OPUS Group 10
  • 11.
    Target Systems MIDAS MM HW PDP C++ Java/AspectJ Java/AspectJ C# 76 KLOC 54 KLOC 49 KLOC 22 KLOC 111 anomalies 170 anomalies 252 anomalies 175 anomalies  6 different systems  40 revisions  Architecture information available Roberta @ OPUS Group 11
  • 12.
    Study Phases 1. Data Collection 2. Analysis of code anomalies impact on identified architecture problems 3. Refactoring extraction 4. Analysis of refactoring on identified architecture problems Roberta @ OPUS Group 12
  • 13.
    Data Collection Recovering ActualArchitecture Identifying Architecture Problems Detecting Code Anomalies Analyzing the Impact of Code Anomalies DATA BUSINESS GUI DATA Roberta @ OPUS Group 13
  • 14.
    Data Collection Recovering ActualArchitecture Identifying Architecture Problems Detecting Code Anomalies Analyzing the Impact of Code Anomalies DATA BUSINESS GUI Roberta @ OPUS Group 14
  • 15.
    Data Collection Recovering ActualArchitecture Identifying Architecture Problems Detecting Code Anomalies Analyzing the Impact of Code Anomalies DATA DATA BUSINESS GUI Roberta @ OPUS Group 15
  • 16.
    Analysis Recovering Actual Architecture IdentifyingArchitecture Problems Detecting Code Anomalies Analyzing the Impact of Code Anomalies DATA DATA BUSINESS GUI Roberta @ OPUS Group 16
  • 17.
    Analyzing the Impactof Code Anomalies I  Null hypothesis: There is no relation between code anomalies and architecture problems Roberta @ OPUS Group 17
  • 18.
    Analyzing the Impactof Code Anomalies I  Null hypothesis: There is no relation between code anomalies and architecture problems  Fisher’s exact test Roberta @ OPUS Group 18
  • 19.
    Analyzing the Impactof Code Anomalies I Code anomalies and architecture problems were related in 77,5% of the analyzed versions Roberta @ OPUS Group 19
  • 20.
    Analyzing the Impactof Code Anomalies II  Downstream Analysis  Which architecture problems were caused by code anomalies DATA DATA BUSINESS GUI Roberta @ OPUS Group 20
  • 21.
    Analyzing the Impactof Code Anomalies II  Downstream Analysis 100 90 80 70 60 Not Caused by Code 50 Anomalies 40 Caused by Code 30 Anomalies 20 10 0 HW MM PDP MIDAS Roberta @ OPUS Group 21
  • 22.
    Analyzing the Impactof Code Anomalies II  Upstream Analysis  Which code anomalies caused architecture problems DATA DATA BUSINESS GUI Roberta @ OPUS Group 22
  • 23.
    Analyzing the Impactof Code Anomalies II  Upstream Analysis 100 90 80 70 60 50 Irrelevant 40 Relevant 30 20 10 0 HW MM PDP MIDAS Roberta @ OPUS Group 23
  • 24.
    Analyzing the Impactof Code Anomalies II  Upstream Analysis 100 90 80 70 60 50 Irrelevant 40 Relevant 30 20 10 0 HW MM PDP MIDAS Roberta @ OPUS Group 24
  • 25.
    Identifying Relevant CodeAnomalies  Code anomalies were divided by  Type of code anomaly  Earliness of anomaly Roberta @ OPUS Group 25
  • 26.
    Type of CodeAnomaly # of releases where each type of anomaly was significant (causing architecture problems) Roberta @ OPUS Group 26
  • 27.
    Type of CodeAnomaly # of releases where each type of anomaly was statistically significant (causing architecture problems) Roberta @ OPUS Group 27
  • 28.
    Earliness of Anomaly  Early anomaly: appears in the 1st version of each system 18% Of all architecturally-relevant code anomalies were identified as early anomalies Roberta @ OPUS Group 28
  • 29.
    Earliness of Anomaly  Early anomaly: appears in the 1st version of each system and were related to more than 18% 37% Of all architecturally-relevant code anomalies were identified of all architecture problems as early anomalies Roberta @ OPUS Group 29
  • 30.
    Refactoring of RelevantAnomalies  We wanted to analyze whether architecturally- relevant anomalies were often refactored  Detecting refactorings from source code history  Commit messages  Source code diffs (manually inspected)  Checking whether the refactored anomaly was architecturally-relevant Roberta @ OPUS Group 30
  • 31.
    Refactoring of RelevantAnomalies  658 refactorings  33% high-level  Move member (16%)  Extract class or superclass (12%)  67% low-level  Rename (32%)  Extract local variable (16%)  37% of all architecture-relevant anomalies were refactored  Isolated versions concentrated most of the refactoring efforts Roberta @ OPUS Group 31
  • 32.
    Concluding Remarks Roberta @ OPUS Group 32
  • 33.
    Concluding Remarks Roberta @ OPUS Group 33
  • 34.
    Concluding Remarks Architecturally-relevant anomalies arenot frequently refactored Roberta @ OPUS Group 34
  • 35.
    Concluding Remarks Architecturally-relevant anomalies arenot frequently refactored Roberta @ OPUS Group 35
  • 36.
    Thank you ? Roberta @ OPUS Group 36
  • 37.
    Cause-Effect Criteria 1 Recurrently inferred in all systems versions 2 Observed in different modules of the same system 3 Modules involved the contribution of different developers  Isela Macia et al – AOSD 2011: An Exploratory Study of Code Smells in Aspect-Oriented Systems.  Isela Macia et al – AOSD 2012: Are Automatically-detected Code Anomalies relevant to Architectural Modularity? Roberta @ OPUS Group 37

Editor's Notes

  • #2 PresentationStructure:(FollowingthePaper)1- Definitions (codeanomalies, architecturalanomalies, architecturedegradation)2- Problem/Motivation3- Methodology4- Study Cases5- Results6- ConclusionRemarks
  • #3 The methapor of Code anomaly or bad smell was coined by Fowler and Beck as a program structure that usually indicates a deeper problem in the system  
  • #4  However, code anomalies are particularly severe when they introduce architecture problems. Examples of these problems are architectural anomalies*Architectural anomalies are compositions of architecture elements that hinder system maintainability.There are several examples of architectural anomalies documented in the literature-
  • #5 We’d also like to define architecturally-relevant code anomalies. We call architecturally-relevant all those code anomalies that cause or are related to architecture problems. This God Class, for example.We took this example from a real world application that we analyzed in our study
  • #7 - *Manystudieshavebeendedicated to analyzingcodeanomalyimpactKhomh et al [17] also investigate the impact of code anomalies on system changes. They found that anomalous code elements tend to change more frequently than free-anomalous elements.
  • #8 Other works investigate the impact of automatically-detected code anomalies on software defects (i.e. the need for corrective maintenance). For instance, D‟Ambros et al found that, while some code anomalies are more frequent, none of them can be considered more harmful with respect to software defects
  • #14 This phase was based on a semi-automatic process. We have used Sonar [43] and Understand [47] to support the recovery of the actual architecture from the source code. These tools support architecture and code analyses in order to help developers to analyze and measure the modularity of the system‟s architecture and implementation
  • #15 Developers and architects collaborated to provide explicit mappings between the actual, extracted architecture (EA) and the intended architecture (IA). These mappings will be used by the Reflexion Model-based tools to measure the conformance in terms of convergence (a component or relationship that is in both EA and IE), divergence (a component or relationship that is in EA but not in IA), and absence (a component or relationship that is in IA but not EA). For instance, all absence classifications were considered as violations Architectural anomalies were detected by architects based mainly on: (i) a visual inspection of the EA, and (ii) acareful analysis of the code-level elements mapped to architectural-level elements, due to the lack of tools. Wealso asked the original architects to indicate other anomalies observed in the architecture design beyond those presentedin Table 2. This helped us to better judge whether and which code anomalies are good indicators of architecturalmodularity problems.
  • #16 For detectingthecodeanomalies, weuseddifferenttools, basedondetectionstrategies. For example, for the Java projects, TogetherandUnderstandwereused for identifyingcodeanomalies; PDP ontheotherhand, is a C# project, soweusedNdepend to analyze it. In thisillustration, theredcrossesmarksdetectedcodeanomalies
  • #17 -Finally, weanalyzedtherelationsbetweendetectedcodeanomaliesandtheidentifiedarchitectureproblems.
  • #35 Furthermore, our study provided some findings that can help developers to build more effective tools for identifying more severe code smells. For instance, some architecturally-relevant code smell occurrences cannot be detected and prioritized if architectural decisions are not somehow traced and mapped to the source code, and used by code-level smell detection tools.
  • #36 Finally, ourresultssuggestthatmechanisms for detectingarchitecturally-relevantcodeanomaliesshouldalsoanalyzetherelationshipbetweencodeanomaliesandtheirimpactonthearchitecture design, that is theyshouldlook for patternsofcodeanomaliesratherthansolelyrelyon individual codeanomalies.Finally, weobservedthat certain recurring patterns of co-occurring code anomalies and the propagation of code anomalies from parents to children in the inheritance trees tend to be stronger indicators of architecture problems than individual anomalyoccurrences