Upcoming SlideShare
×

# Problem Detection (EVO 2008)

965 views

Published on

This set of slides was used a lecture on Problem Detection.

2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
965
On SlideShare
0
From Embeds
0
Number of Embeds
121
Actions
Shares
0
33
0
Likes
2
Embeds 0
No embeds

No notes for slide

### Problem Detection (EVO 2008)

1. 1. Metrics and Problem Detection Tudor Gîrba www.tudorgirba.com
2. 2. Software is complex. 29% Succeeded 18% Failed 53% Challenged The Standish Group, 2004
3. 3. How large is your project?
4. 4. How large is your project? 1’000’000 lines of code
5. 5. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds
6. 6. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds / 3600 = 560 hours
7. 7. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds / 3600 = 560 hours / 8 = 70 days
8. 8. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds / 3600 = 560 hours / 8 = 70 days / 20 = 3 months
9. 9. } } { { } } { { g rin ee gin en d ar rw fo
10. 10. fo rw ar d en gin ee rin g { { { { { { } { { } } actual development } } } { } } }
11. 11. What is the current state? fo rw ar What should we do? d en Where to start? gin ee How to proceed? rin g { { { { { { } { { } } actual development } } } { } } }
12. 12. fo rw g rin ar ee d gin en gin en ee se rin erv g re { { { { { { } { { } } actual development } } } { } } }
13. 13. Reverse engineering is analyzing a subject system to: identify components and their relationships, and create more abstract representations. Chikofky & Cross, 90
14. 14. { { { { } } } } { } A large system contains lots of details.
15. 15. ity? its qual ju dge How to { { { { } } } } { } A large system contains lots of details.
16. 16. http://moose.unibe.ch http://loose.upt.ro/incode
17. 17. 1 Metrics 2Design Problems 3 Code Duplication
18. 18. Metrics 1
19. 19. Youcannot control what you cannot measure. Tom de Marco
20. 20. Metrics are functions that assign numbers to products, processes and resources.
21. 21. Software metrics are measurements which relate to software systems, processes or related documents.
22. 22. Metrics compress system traits into numbers.
23. 23. Let’s see some examples...
24. 24. Examples of size metrics NOM - number of methods NOA - number of attributes LOC - number of lines of code NOS - number of statements NOC - number of children Lorenz, Kidd, 1994 Chidamber, Kemerer, 1994
25. 25. McCabe cyclomatic complexity (CYCLO) counts the number of independent paths through the code of a function. McCabe, 1977  it reveals the minimum number of tests to write  interpretation can’t directly lead to improvement action
26. 26. Weighted Method Count (WMC) sums up the complexity of class’ methods (measured by the metric of your choice; usually CYCLO). Chidamber, Kemerer, 1994  it is conﬁgurable, thus adaptable to our precise needs  interpretation can’t directly lead to improvement action
27. 27. Depth of Inheritance Tree (DIT) is the (maximum) depth level of a class in a class hierarchy. Chidamber, Kemerer, 1994  inheritance is measured  only the potential and not the real impact is quantiﬁed
28. 28. Coupling between objects (CBO) shows the number of classes from which methods or attributes are used. Chidamber, Kemerer, 1994  it takes into account real dependencies not just declared ones  no differentiation of types and/or intensity of coupling
29. 29. Tight Class Cohesion (TCC) counts the relative number of method-pairs that access attributes of the class in common. Bieman, Kang, 1995 TCC = 2 / 10 = 0.2  interpretation can lead to improvement action  ratio values allow comparison between systems
30. 30. Access To Foreign Data (ATFD) counts how many attributes from other classes are accessed directly from a measured class. Marinescu 2006
31. 31. ...
32. 32. Design Problems 2
33. 33. McCall, 1977
34. 34. Metrics Assess and Improve Quality!
35. 35. Metrics Assess and Improve Quality! a lly ? Re
36. 36. McCall, 1977
37. 37. Problem 1: metrics granularity ? capture symptoms, not causes of problems in isolation, they don’t lead to improvement solutions
38. 38. Problem 1: metrics granularity ? capture symptoms, not causes of problems in isolation, they don’t lead to improvement solutions Problem 2: implicit mapping we don’t reason in terms of metrics, but in terms of design principles
39. 39. 2 big obstacles in using metrics: Thresholds make metrics hard to interpret Granularity make metrics hard to use in isolation
40. 40. Can metrics help me in what I really care for? :)
41. 41. fo rw ar d en gin ee rin g { { { { { { } { { } } actual development } } } { } } }
42. 42. fo rw ar How do I understand code? d en gin ee rin g { { { { { { } { { } } actual development } } } { } } }
43. 43. fo rw ar How do I understand code? d en How do I improve code? gin ee rin g { { { { { { } { { } } actual development } } } { } } }
44. 44. etr ics! ith m de al w an t to on ot w Id fo rw ar How do I understand code? d en How do I improve code? gin ee rin g { { { { { { } { { } } actual development } } } { } } }
45. 45. How to get an initial understanding of a system?
46. 46. Metric Value LOC 35175 NOM 3618 NOC 384 CYCLO 5579 NOP 19 CALLS 15128 FANOUT 8590 AHH 0.12 ANDC 0.31
47. 47. Metric Value LOC 35175 NOM 3618 NOC 384 CYCLO 5579 NOP 19 CALLS 15128 FANOUT 8590 AHH 0.12 ANDC 0.31
48. 48. Metric Value LOC 35175 NOM 3618 NOC 384 CYCLO 5579 NOP 19 CALLS 15128 FANOUT 8590 ha t? ww AHH An d no 0.12 ANDC 0.31
49. 49. We need means to compare.
50. 50. hierarchies? coupling?
51. 51. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 Inheritance ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT Size Communication
52. 52. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT Size
53. 53. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT Communication
54. 54. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 Inheritance ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT
55. 55. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT
56. 56. Java C++ LOW AVG HIGH LOW AVG HIGH CYCLO/LOC 0.16 0.20 0.24 0.20 0.25 0.30 LOC/NOM 7 10 13 5 10 16 NOM/NOC 4 7 10 4 9 15 ...
57. 57. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT close to high close to average close to low
58. 58. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 close to high close to average close to low
59. 59. etr ics! ith m de al w an t to on ot w Id fo rw ar How do I understand code? d en How do I improve code? gin ee rin g { { { { { { } { { } } actual development } } } { } } }
60. 60. How do I improve code?
61. 61. Quality is more than 0 bugs. Breaking design principles, rules and best practices deteriorates the code; it leads to design problems.
62. 62. Imagine changing just a small design fragment
63. 63. Imagine changing just a small design fragment
64. 64. Imagine changing just a small design fragment and33% of all classes would require changes
65. 65. expensive Design problems are frequent unavoidable
66. 66. expensive Design problems are frequent unavoidable th em? ate limin nd e ete ct a to d How
67. 67. God Classes tend to centralize the intelligence of the system, to do everything and to use data from small data-classes. Riel, 1996
68. 68. God Classes tend to centralize the intelligence of the system, to do everything and to use data from small data-classes.
69. 69. God Classes centralize the intelligence of the system, do everything and use data from small data-classes.
70. 70. God Classes are complex, are not cohesive, access external data.
71. 71. God Classes are complex, WMC is high are not cohesive, TCC is low access external data. ATFD more than few
72. 72. God Classes are complex, WMC is high are not cohesive, TCC is low access external data. ATFD more than few sing uer ies u to q s s in ator etric per se m ical o Co mpo log
73. 73. Detection Strategies are metric-based queries to detect design ﬂaws. Lanza, Marinescu 2006 Rule 1 METRIC 1 > Threshold 1 AND Quality problem Rule 2 METRIC 2 < Threshold 2
74. 74. A God Class centralizes too much intelligence in the system. Lanza, Marinescu 2006 Class uses directly more than a few attributes of other classes ATFD > FEW Functional complexity of the class is very high AND GodClass WMC ! VERY HIGH Class cohesion is low TCC < ONE THIRD
75. 75. An Envious Method is more interested in data from a handful of classes. Lanza, Marinescu 2006 Method uses directly more than a few attributes of other classes ATFD > FEW Method uses far more attributes of other classes than its own AND Feature Envy LAA < ONE THIRD The used "foreign" attributes belong to very few other classes FDP ! FEW
76. 76. Data Classes are dumb data holders. Lanza, Marinescu 2006 Interface of class reveals data rather than offering services WOC < ONE THIRD AND Data Class Class reveals many attributes and is not complex
77. 77. Data Classes are dumb data holders. Lanza, Marinescu 2006 More than a few public data NOAP + NOAM > FEW AND Complexity of class is not high WMC < HIGH Class reveals many OR attributes and is not Class has many public complex data NOAP + NOAM > MANY AND Complexity of class is not very high WMC < VERY HIGH
78. 78. Shotgun Surgery depicts that a change in an operation triggers many (small) in a lot of different operation and classes. Lanza, Marinescu 2006
79. 79. Code Duplication 3
80. 80. What is Code Duplication?
81. 81. What is Code Duplication? obl em? a pr hy is it A nd w
82. 82. Code Duplication Detection Lexical Equivalence Syntactical Equivalence Semantic Equivalence
83. 83. Visualization of Copied Code Sequences File A File B File A File B
84. 84. Transformation Comparison Source Code Transformed Code Duplication Data Author Level Transformed Code Comparison Technique Johnson 94 Lexical Substrings String-Matching Ducasse 99 Lexical Normalized Strings String-Matching Baker 95 Syntactical Parameterized Strings String-Matching Mayrand 96 Syntactical Metrics Tuples Discrete comparison Kontogiannis 97 Syntactical Metrics Tuples Euclidean distance Baxter 98 Syntactical AST Tree-Matching
85. 85. Noise Elimination … //assign same fastid as container fastid=NULL; fastid = NULL; constchar*fidptr=get_fastid(); const char* fidptr = get_fastid(); if(fidptr!=NULL) if(fidptr != NULL) { intl=strlen(fidptr) int l = strlen(fidptr); fastid = newchar[l+] fastid = newchar[ l + 1 ];
86. 86. Enhanced Simple Detection Approach
87. 87. lines from source a b c d a b c d lines from source
88. 88. lines from source a b c d a x y d lines from source
89. 89. lines from source a b c a b x y c lines from source
90. 90. lines from source a x b x c x d x lines from source
91. 91. lines from source 2 lines from source 1
92. 92. lines from source 2 lines from source 1
93. 93. lines from source 2 lines from source 1 exact chunk
94. 94. lines from source 2 lines from source 1 exact line chunk bias
95. 95. lines from source 2 lines from source 1 exact line exact chunk bias chunk
96. 96. Signiﬁcant Duplication is large and should be looked at Lanza, Marinescu 2006
97. 97. 1 Metrics 2Design Problems 3 Code Duplication
98. 98. Shotgun Surgery has uses is has (partial) Feature Data Envy uses Class is partially God has Intensive Class Coupling Brain has has Method Extensive Brain has Significant Coupling Class Duplication has is is has Refused is Tradition Parent Breaker Bequest has (subclass) Futile Hierarchy Lanza, Marinescu 2006 Identity Collaboration Classification Disharmonies Disharmonies Disharmonies
99. 99. Follow a clear and repeatable process
100. 100. Follow a clear and repeatable process
101. 101. Follow a clear and repeatable process
102. 102. Follow a clear and repeatable process mb ers! so f nu in term qu ality ab out re ason D on’t
103. 103. QA is part of the the Development Process http://loose.upt.ro/incode
104. 104. Tudor Gîrba www.tudorgirba.com creativecommons.org/licenses/by/3.0/