0
Metrics and Problem Detection



Jorge Ressia
Software is complex.



   29% Succeeded

      18% Failed



   53% Challenged



 The Standish Group, 2004
How large is your project?


    1’000’000 lines of code
    * 2 = 2’000’000 seconds
      / 3600 = 560 hours
         / 8...
}
                                                 }
                                             {
                      ...
fo
                                             rw
                                              ar
                      ...
What is the current state?




                                           fo
                                             ...
fo
                                                                rw
                                               g
   ...
Reverse engineering is analyzing a subject system to:
 identify components and their relationships, and
 create more abstr...
quality?
        udg e its
    to j
How                  {
                         {
                                    ...
http://moose.unibe.ch




http://loose.upt.ro/incode
1
Metrics
                             2Design
                             Problems




              3
          Code Du...
Metrics




          1
Youcannot control
what you cannot measure.




                           Tom de Marco
Metrics are functions that assign numbers to
products, processes and resources.
Software metrics are measurements which
relate to software systems, processes or
related documents.
Metrics compress system traits into numbers.
Let’s see some examples...
Examples of size metrics


NOM - number of methods
NOA - number of attributes
LOC - number of lines of code
NOS - number o...
McCabe cyclomatic complexity (CYCLO) counts
the number of independent paths through the code of a
function.

             ...
Weighted Method Count (WMC) sums up the
complexity of class’ methods (measured by the metric
of your choice; usually CYCLO...
Depth of Inheritance Tree (DIT) is the (maximum)
depth level of a class in a class hierarchy.


                          ...
Coupling between objects (CBO) shows the number
of classes from which methods or attributes are used.


                  ...
Tight Class Cohesion (TCC) counts the relative
number of method-pairs that access attributes of the
class in common.

    ...
Access To Foreign Data (ATFD) counts how many
attributes from other classes are accessed directly from
a measured class.

...
...
Design Problems




                  2
McCall, 1977
Metrics Assess and Improve Quality!




                  ea lly?
              R
McCall, 1977
Problem 1: metrics granularity




                                           ?
capture symptoms, not causes of problems

...
2   big obstacles in using metrics:


     Thresholds make metrics hard to interpret

     Granularity make metrics hard t...
Can metrics help me
             in what I really care for? :)
et rics!
                                                ith m
                                        o   do w
          ...
How to get an initial   understanding of a system?
Metric   Value
LOC      35175
NOM       3618
NOC        384
CYCLO     5579
NOP         19
CALLS    15128
FANOUT    8590
AH...
Metric                  Value
LOC                      35175
NOM                       3618
NOC                        384...
We need means to compare.
hierarchies?

               coupling?
The Overview Pyramid provides a metrics
  overview.                         Lanza, Marinescu 2006


                      ...
The Overview Pyramid provides a metrics
  overview.                         Lanza, Marinescu 2006




                    ...
The Overview Pyramid provides a metrics
  overview.                         Lanza, Marinescu 2006




                    ...
The Overview Pyramid provides a metrics
  overview.                         Lanza, Marinescu 2006


                      ...
The Overview Pyramid provides a metrics
  overview.                         Lanza, Marinescu 2006




                    ...
Java                 C++
            LOW    AVG    HIGH   LOW    AVG    HIGH

CYCLO/LOC   0.16   0.20   0.24   0.20   0.25...
The Overview Pyramid provides a metrics
  overview.                         Lanza, Marinescu 2006




                    ...
The Overview Pyramid provides a metrics
overview.                         Lanza, Marinescu 2006




 close to high     clo...
rics!
                                                      met
                                            do with
      ...
How do I improve code?
Quality is more than 0 bugs.



Breaking design principles, rules and best practices
          deteriorates the code;
    ...
Imagine changing just a small design fragment




and33%
of all classes
would require changes
expensive
Design problems
are frequent
                    unavoidable


                                                 ...
God Classes tend to centralize the intelligence of the
system, to do everything and to use data from small
data-classes.
 ...
God Classes tend
    to centralize the intelligence of the system,
    to do everything and
    to use data from small dat...
God Classes
    centralize the intelligence of the system,
    do everything and
    use data from small data-classes.
God Classes
    are complex,
    are not cohesive,
    access external data.
God Classes
    are complex,
     
    WMC is high
    are not cohesive,

    TCC is low
    access external data.
 ATFD m...
Detection Strategies are metric-based queries to
detect design flaws.                  Lanza, Marinescu 2006




          ...
A God Class centralizes too much intelligence in
the system.                         Lanza, Marinescu 2006


       Class ...
An Envious Method is more interested in data
from a handful of classes.         Lanza, Marinescu 2006


      Method uses ...
Data Classes are dumb data holders.
                                                  Lanza, Marinescu 2006




        In...
Data Classes are dumb data holders.
                                             Lanza, Marinescu 2006

     More than a f...
Shotgun Surgery depicts that a change in an
operation triggers many (small) in a lot of different
operation and classes.
 ...
Code Duplication




                   3
What is Code Duplication?

                                            so f it?
                                 pro blem
...
Code Duplication Detection




     Lexical Equivalence

     Syntactical Equivalence

     Semantic Equivalence
Visualization of Copied Code Sequences

                File A      File B




      File A




      File B
Transformation                   Comparison




           Source Code            Transformed Code              Duplicatio...
Noise Elimination




…
//assign same fastid as container    fastid=NULL;
fastid = NULL;                       constchar*f...
Enhanced Simple Detection Approach
lines from source
         a   b   c   d    a    b     c   d




 lines
 from
source
lines from source
         a   b   c   d    a    x     y   d




 lines
 from
source
lines from source
         a   b   c   a    b    x     y   c




 lines
 from
source
lines from source
         a   x   b    x   c    x     d   x




 lines
 from
source
lines from source 2




  lines
  from
source 1
lines from source 2




  lines
  from
source 1
lines from source 2




  lines
  from
source 1




           exact
           chunk
lines from source 2




  lines
  from
source 1




           exact      line
           chunk      bias
lines from source 2




  lines
  from
source 1




           exact      line       exact
           chunk      bias     ...
Significant Duplication:
- It is the largest possible duplication chain uniting
all exact clones that are close enough to e...
1
Metrics
                             2Design
                             Problems




              3
          Code Du...
Shotgun
                                Surgery                        has

     uses                  is

               ...
Follow a clear and repeatable process



                                                  mb ers!
                       ...
QA is part of the the Development Process




         http://loose.upt.ro/incode
Tudor Gîrba
       www.tudorgirba.com


      Jorge Ressia




creativecommons.org/licenses/by/3.0/
Upcoming SlideShare
Loading in...5
×

05 Problem Detection

1,217

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,217
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
20
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "05 Problem Detection"

  1. 1. Metrics and Problem Detection Jorge Ressia
  2. 2. Software is complex. 29% Succeeded 18% Failed 53% Challenged The Standish Group, 2004
  3. 3. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds / 3600 = 560 hours / 8 = 70 days / 20 = 3 months
  4. 4. } } { { } } { { g rin ee gin en d ar rw fo
  5. 5. fo rw ar d en gin ee rin g { { { { { { } { { } } actual development } } } { } } }
  6. 6. What is the current state? fo rw ar What should we do? d en Where to start? gin ee How to proceed? rin g { { { { { { } { { } } actual development } } } { } } }
  7. 7. fo rw g rin ar ee d gin en gin en ee e rs rin ve g re { { { { { { } { { } } actual development } } } { } } }
  8. 8. Reverse engineering is analyzing a subject system to: identify components and their relationships, and create more abstract representations. Chikofky & Cross, 90
  9. 9. quality? udg e its to j How { { { { } } } } { } A large system contains lots of details.
  10. 10. http://moose.unibe.ch http://loose.upt.ro/incode
  11. 11. 1 Metrics 2Design Problems 3 Code Duplication
  12. 12. Metrics 1
  13. 13. Youcannot control what you cannot measure. Tom de Marco
  14. 14. Metrics are functions that assign numbers to products, processes and resources.
  15. 15. Software metrics are measurements which relate to software systems, processes or related documents.
  16. 16. Metrics compress system traits into numbers.
  17. 17. Let’s see some examples...
  18. 18. Examples of size metrics NOM - number of methods NOA - number of attributes LOC - number of lines of code NOS - number of statements NOC - number of children Lorenz, Kidd, 1994 Chidamber, Kemerer, 1994
  19. 19. McCabe cyclomatic complexity (CYCLO) counts the number of independent paths through the code of a function. McCabe, 1977  it reveals the minimum number of tests to write  interpretation can’t directly lead to improvement action
  20. 20. Weighted Method Count (WMC) sums up the complexity of class’ methods (measured by the metric of your choice; usually CYCLO). Chidamber, Kemerer, 1994  it is configurable, thus adaptable to our precise needs  interpretation can’t directly lead to improvement action
  21. 21. Depth of Inheritance Tree (DIT) is the (maximum) depth level of a class in a class hierarchy. Chidamber, Kemerer, 1994  inheritance is measured  only the potential and not the real impact is quantified
  22. 22. Coupling between objects (CBO) shows the number of classes from which methods or attributes are used. Chidamber, Kemerer, 1994  it takes into account real dependencies not just declared ones  no differentiation of types and/or intensity of coupling
  23. 23. Tight Class Cohesion (TCC) counts the relative number of method-pairs that access attributes of the class in common. Bieman, Kang, 1995 TCC = 2 / 10 = 0.2  interpretation can lead to improvement action  ratio values allow comparison between systems
  24. 24. Access To Foreign Data (ATFD) counts how many attributes from other classes are accessed directly from a measured class. Marinescu 2006
  25. 25. ...
  26. 26. Design Problems 2
  27. 27. McCall, 1977
  28. 28. Metrics Assess and Improve Quality! ea lly? R
  29. 29. McCall, 1977
  30. 30. Problem 1: metrics granularity ? capture symptoms, not causes of problems in isolation, they don’t lead to improvement solutions Problem 2: implicit mapping we don’t reason in terms of metrics, but in terms of design principles
  31. 31. 2 big obstacles in using metrics: Thresholds make metrics hard to interpret Granularity make metrics hard to use in isolation
  32. 32. Can metrics help me in what I really care for? :)
  33. 33. et rics! ith m o do w thi ng t an t no Iw fo rw ar How do I understand code? d en How do I improve code? gin ee rin g { { { { { { } { { } } actual development } } } { } } }
  34. 34. How to get an initial understanding of a system?
  35. 35. Metric Value LOC 35175 NOM 3618 NOC 384 CYCLO 5579 NOP 19 CALLS 15128 FANOUT 8590 AHH 0.12 ANDC 0.31
  36. 36. Metric Value LOC 35175 NOM 3618 NOC 384 CYCLO 5579 NOP 19 CALLS 15128 FANOUT 8590 t? wha AHH nd now 0.12 A ANDC 0.31
  37. 37. We need means to compare.
  38. 38. hierarchies? coupling?
  39. 39. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 Inheritance ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT Size Communication
  40. 40. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT Size
  41. 41. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT Communication
  42. 42. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 Inheritance ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT
  43. 43. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT
  44. 44. Java C++ LOW AVG HIGH LOW AVG HIGH CYCLO/LOC 0.16 0.20 0.24 0.20 0.25 0.30 LOC/NOM 7 10 13 5 10 16 NOM/NOC 4 7 10 4 9 15 ...
  45. 45. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT close to high close to average close to low
  46. 46. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 close to high close to average close to low
  47. 47. rics! met do with o ng t n othi Iw ant fo rw ar How do I understand code? d en How do I improve code? gin ee rin g { { { { { { } { { } } actual development } } } { } } }
  48. 48. How do I improve code?
  49. 49. Quality is more than 0 bugs. Breaking design principles, rules and best practices deteriorates the code; it leads to design problems.
  50. 50. Imagine changing just a small design fragment and33% of all classes would require changes
  51. 51. expensive Design problems are frequent unavoidable ? e them inat elim and detect How to
  52. 52. God Classes tend to centralize the intelligence of the system, to do everything and to use data from small data-classes. Riel, 1996
  53. 53. God Classes tend to centralize the intelligence of the system, to do everything and to use data from small data-classes.
  54. 54. God Classes centralize the intelligence of the system, do everything and use data from small data-classes.
  55. 55. God Classes are complex, are not cohesive, access external data.
  56. 56. God Classes are complex, WMC is high are not cohesive, TCC is low access external data. ATFD more than few su sing u erie to q s cs in rator etri pe se m ical o C o mpo log
  57. 57. Detection Strategies are metric-based queries to detect design flaws. Lanza, Marinescu 2006 Rule 1 METRIC 1 > Threshold 1 AND Quality problem Rule 2 METRIC 2 < Threshold 2
  58. 58. A God Class centralizes too much intelligence in the system. Lanza, Marinescu 2006 Class uses directly more than a few attributes of other classes ATFD > FEW Functional complexity of the class is very high AND GodClass WMC ! VERY HIGH Class cohesion is low TCC < ONE THIRD
  59. 59. An Envious Method is more interested in data from a handful of classes. Lanza, Marinescu 2006 Method uses directly more than a few attributes of other classes ATFD > FEW Method uses far more attributes of other classes than its own AND Feature Envy LAA < ONE THIRD The used quot;foreignquot; attributes belong to very few other classes FDP ! FEW
  60. 60. Data Classes are dumb data holders. Lanza, Marinescu 2006 Interface of class reveals data rather than offering services WOC < ONE THIRD AND Data Class Class reveals many attributes and is not complex
  61. 61. Data Classes are dumb data holders. Lanza, Marinescu 2006 More than a few public data NOAP + NOAM > FEW AND Complexity of class is not high WMC < HIGH Class reveals many OR attributes and is not Class has many public complex data NOAP + NOAM > MANY AND Complexity of class is not very high WMC < VERY HIGH
  62. 62. Shotgun Surgery depicts that a change in an operation triggers many (small) in a lot of different operation and classes. Lanza, Marinescu 2006
  63. 63. Code Duplication 3
  64. 64. What is Code Duplication? so f it? pro blem he ta re t a Wh
  65. 65. Code Duplication Detection Lexical Equivalence Syntactical Equivalence Semantic Equivalence
  66. 66. Visualization of Copied Code Sequences File A File B File A File B
  67. 67. Transformation Comparison Source Code Transformed Code Duplication Data Author Level Transformed Code Comparison Technique Johnson 94 Lexical Substrings String-Matching Ducasse 99 Lexical Normalized Strings String-Matching Baker 95 Syntactical Parameterized Strings String-Matching Mayrand 96 Syntactical Metrics Tuples Discrete comparison Kontogiannis 97 Syntactical Metrics Tuples Euclidean distance Baxter 98 Syntactical AST Tree-Matching
  68. 68. Noise Elimination … //assign same fastid as container fastid=NULL; fastid = NULL; constchar*fidptr=get_fastid(); const char* fidptr = get_fastid(); if(fidptr!=NULL) if(fidptr != NULL) { intl=strlen(fidptr) int l = strlen(fidptr); fastid = newchar[l+] fastid = newchar[ l + 1 ];
  69. 69. Enhanced Simple Detection Approach
  70. 70. lines from source a b c d a b c d lines from source
  71. 71. lines from source a b c d a x y d lines from source
  72. 72. lines from source a b c a b x y c lines from source
  73. 73. lines from source a x b x c x d x lines from source
  74. 74. lines from source 2 lines from source 1
  75. 75. lines from source 2 lines from source 1
  76. 76. lines from source 2 lines from source 1 exact chunk
  77. 77. lines from source 2 lines from source 1 exact line chunk bias
  78. 78. lines from source 2 lines from source 1 exact line exact chunk bias chunk
  79. 79. Significant Duplication: - It is the largest possible duplication chain uniting all exact clones that are close enough to each other. - The duplication is large enough. Lanza, Marinescu 2006
  80. 80. 1 Metrics 2Design Problems 3 Code Duplication
  81. 81. Shotgun Surgery has uses is has (partial) Feature Data Envy uses Class is partially God has Intensive Class Coupling Brain has has Method Extensive Brain has Significant Coupling Class Duplication has is is has Refused is Tradition Parent Breaker Bequest has (subclass) Futile Hierarchy Lanza, Marinescu 2006 Identity Collaboration Classification Disharmonies Disharmonies Disharmonies
  82. 82. Follow a clear and repeatable process mb ers! u ms of n in ter qu ality ut n abo easo Don’t r
  83. 83. QA is part of the the Development Process http://loose.upt.ro/incode
  84. 84. Tudor Gîrba www.tudorgirba.com Jorge Ressia creativecommons.org/licenses/by/3.0/
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×