Ducasse's Maintenance Expertise


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Ducasse's Maintenance Expertise

  1. 1. LSE A Portfolio of Software Evolution Expertise Stéphane Ducasse stephane.ducasse@inria.fr http://stephane.ducasse.free.fr/ Stéphane Ducasse 1
  2. 2. A word of presentation Co-author of Object-Oriented Reengineering Patterns Co-developer of Moose (reengineering platform) 10 PhD Theses in reengineering 50+ articles Grounded in reality Was maintainer of Squeak 3.9 Worked with: Harman-Becker AG Bedag AG Nokia, Daimler LSE S.Ducasse 2
  3. 3. Roadmap • Some facts • Our approach • Supporting maintenance • Moose an open-platform • Some visual examples • Conclusion LSE S.Ducasse 3
  4. 4. Software is complex. 29% Succeeded 18% Failed 53% Challenged The Standish Group, 2004 LSE S.Ducasse 4
  5. 5. How large is your project? LSE S.Ducasse 5
  6. 6. How large is your project? LSE S.Ducasse 5
  7. 7. How large is your project? LSE S.Ducasse 5
  8. 8. How large is your project? LSE S.Ducasse 5
  9. 9. How large is your project? 1’000’000 lines of code LSE S.Ducasse 5
  10. 10. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds LSE S.Ducasse 5
  11. 11. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds / 3600 = 560 hours LSE S.Ducasse 5
  12. 12. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds / 3600 = 560 hours / 8 = 70 days LSE S.Ducasse 5
  13. 13. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds / 3600 = 560 hours / 8 = 70 days / 20 = 3 months LSE S.Ducasse 5
  14. 14. Maintenance is Continuous Development 4.1% Other 18.2% Adaptive (new platforms or OS) Relative Maintenance Effort Between 50% and 75% of global effort is spent on 17.4% Corrective “maintenance” ! (fixing reported errors) 60.3% Perfective (new functionality) The bulk of the maintenance cost is due to new functionality even with better requirements, it is hard to predict new functions LSE S.Ducasse 6
  15. 15. Lehman’s Software Evolution Laws Continuous Change: “A program that is used in a real-world environment must change, or become progressively less useful in that environment.” Software Entropy: “As a program evolves, it becomes more complex, and extra resources are needed to preserve and simplify its structure.” LSE S.Ducasse 7
  16. 16. Roadmap • Some facts • Our approach • Supporting maintenance • Moose an open-platform • Some visual examples • Conclusion LSE S.Ducasse 8
  17. 17. Supporting the evolution of applications A research goal and agenda grounded in reality How to help companies maintaining their large software? What is the xray for software? code, people, practices Which analyses? How can you monitor your system (dashboards....) How to present extracted information? S.Ducasse 9
  18. 18. Covered topics Analyses Topics Reverse Engineering Metamodeling, Software metrics, Program understanding, Representation Transformations Visualization, Evolution analysis, Duplicated code detection, Evolution Code Analysis, Refactorings, Tests Contributions Moose: an open-source extensible reengineering environment: (Lugano, Bern, Annecy, Anvers, Louvain la neuve, ULB, UTSL) Contacts Harman-Becker (3 Millions C++), Bedag (Cobol), Nokia, ABB, IMEC S.Ducasse 10
  19. 19. Software Metrics [LMO99, OOPSLA00] Duplicated Code Identification Understanding Large Systems [ICSM99, ICSM02] Group Identification [WCRE99, TSI00, TSE03] Static/Dynamic Information [ASE03] Test Generation [ICSM99] Feature Analysis [CSMR 06] Concept Identification [JSME 06] Analyses [WCRE 06] Class Understanding [OOPSLA01,TSE04] Package Blueprints Reverse [ICSM 07] Engineering Distribution Maps [ICSM 06] Representation Transformations Language Independent Refactorings [IWPSE 00] Evolution Language Independent Meta Model (FAMIX) Reengineering Patterns [UML99] Version Analyses An Extensible Reengineering [ICSM 05] Environment (Moose) HISMO metamodel [Models 06] [JSME 05] LSE S.Ducasse 11
  20. 20. One Example: who is responsible of what? (4) Visualisation (3) Analyses 2) Modèle (1) Extraction Distribution Map of authors on JBoss S.Ducasse 12
  21. 21. Moose is a reengineering tool which integrates multiple techniques Number of classes = 382 Number of methods = 4268 Metrics … Visualization Moose Queries and Navigation word1 word2 … Semantic Analysis Evolution Analysis LSE S.Ducasse 13
  22. 22. Moose is open and open-source meta-described meta-model aware Method Class Inheritance LSE S.Ducasse 14
  23. 23. Designed to be extensible Class History Duplication Class Author Version Method Class File Event Inheritance Trace LSE S.Ducasse 15
  24. 24. Roadmap • Some facts • Our approach • Supporting maintenance • Moose an open-platform • Some visual examples • Conclusion LSE S.Ducasse 16
  25. 25. Understanding large systems Understanding code is difficult! Systems are large Code is abstract Should I really convinced you? Some existing approaches Metrics: problems you often get meaningless results once combined Visualization: often beautiful but without meaning LSE S.Ducasse 17
  26. 26. Polymetric views W: # fields H: # methods C: # lines of code LSE S.Ducasse 18
  27. 27. Polymetric views condense information To get a feel of the inheritance semantics: adding vs. reusing Classes+Inheritance W: # of Added Methods H: # of Overridden Method C: # of Method Extended methods LOC # statements # parameters LSE S.Ducasse 19
  28. 28. Navigating Views... LSE S.Ducasse 20
  29. 29. Understanding classes Understanding even a class is difficult! LSE S.Ducasse 21
  30. 30. Class Blueprint Enriched call flow annotated with metrics to give semantics Initialization External Interface Internal Implementation Accessor Attribute Invocation Sequence LSE S.Ducasse 22
  31. 31. Class Blueprint LSE S.Ducasse 23
  32. 32. Large delegating interface LSE S.Ducasse 24
  33. 33. Sharing Flows LSE S.Ducasse 25
  34. 34. Regular Subclasses LSE S.Ducasse 26
  35. 35. Patterns LSE S.Ducasse 27
  36. 36. How can we predict changes? Common wisdom stresses that what changes yesterday will change today, but it is true? In the Sahara the weather is constant, tomorrow: 90% chance that it is the same as today In Belgium, the weather is changing really fast (sea influence), 30% chance that it is the same as today LSE S.Ducasse 28
  37. 37. With history analysis we can get the climate of a software system Past Late Future Early Changers Changers 1, TopLENOM1..i (S, t1) ∩ TopEENOMi..n (S, t2) ≠ ∅ YWi(S) = 0, TopLENOM1..i (S, t1) ∩ TopEENOMi..n (S, t2) = ∅ ∑ YWi(S, t1, t2) YW(S, t1, t2) = Past Present Future n-2 hit versions version versions LSE S.Ducasse 29
  38. 38. How developers develop? • More efficient to put people working together in the same office? • How can we optimize software development? LSE S.Ducasse 30
  39. 39. Who did that? Files Time LSE S.Ducasse 31
  40. 40. Line colors show which author owned which files in which period Green author Green author large commit ownership File A File B Blue author small commit LSE S.Ducasse 32
  41. 41. Which author “possesses” which files? LSE S.Ducasse 33
  42. 42. Alphabetical order is no order! LSE S.Ducasse 34
  43. 43. Based on similar commit signature Edit Takeover Monologue Familiarization Dialogue LSE S.Ducasse 35
  44. 44. Understanding evolution of large systems • How old are the hierarchies? • How did the classes change? • How did the inheritance change? LSE S.Ducasse 36
  45. 45. Evolution holds useful information A A A A A BC BC BC B D D D time A is persistent C was removed B is stable E is newborn D inherited from C and then from A … LSE S.Ducasse 37
  46. 46. Hierarchy Evolution Complexity View characterizes class hierarchy histories ENOM A Age ENOS Class History Removed C B Age Inheritance History E D Removed A is persistent C was removed B is stable E is newborn D inherited from C and then from A … LSE S.Ducasse 38
  47. 47. Class hierarchies over 40 versions of Jun - a 740 classes, 3D framework LSE S.Ducasse 39
  48. 48. Identifying Duplicated Code “Parsing the program suite of interest requires a parser for the language dialect of interest. While this is nominally an easy task, in practice one must acquire a tested grammar for the dialect of the language at hand. Often for legacy codes, the dialect is unique and the developing organization will need to build their own parser. Worse, legacy systems often have a number of languages and a parser is needed for each. Standard tools such as Lex and Yacc are rather a disappointment for this purpose, as they deal poorly with lexical hiccups and language ambiguities.” [Baxter 98] Problems Unknown Duplicated Code Scalability Understanding LSE S.Ducasse 40
  49. 49. Language Independent a b c defa b cdef Language independent, Textual, [ICSM’99], M. Rieger’s PhD. Thesis Duploc handled Exact Copies Pascal, Java, Smalltalk, Python, a b c d e fa b x y e f Cobol, C++, PDP-11, C Slower than other approaches but... Max 45 min to adapt our approach to a new language Between 3% and 10% Copies with less identification than parametrized match LSE S.Ducasse 41
  50. 50. A Conceptual Matrix File A File B a b c defa b cdef File A Exact Copies a b c d e fa b x y e f File B Copies with Variations 42 LSE S.Ducasse
  51. 51. Entities that change together can reveal hidden dependencies (A,B,C,D,E) () A 2 3 3 3 4 6 (A,B,C,D) (A,D,E) (v6) (v2) B 6 6 6 5 6 7 (A,B,C) (D,E) (A,D) C 3 3 5 5 8 9 (v5,v6) (v2,v4) (v2,v6) D 1 3 3 4 4 6 (D) (C) (A) (v2,v4,v6) (v3,v5,v6) (v2,v5,v6) E 4 5 5 6 6 6 v1 v2 v3 v4 v5 v6 () (v1,v2,v3,v4,v5,v6) LSE S.Ducasse 43
  52. 52. How properties spread in large systems? Properties: Metrics People Symbol/Concepts Spread = how many packages does it touch? Focus = do packages and properties match? Distribution Map: a generic visualization LSE S.Ducasse 44
  53. 53. Distribution Map LSE S.Ducasse 45
  54. 54. Ownership • Authors in JBoss LSE S.Ducasse 46
  55. 55. Characterizing Packages Butterflies [Metrics05] Kind of Radar LSE S.Ducasse 47
  56. 56. Relative version LSE S.Ducasse 48
  57. 57. How to understand Packages Packages are key structuring elements But complex: import classes.... Package Blueprints [ICSM 2007] LSE S.Ducasse 49
  58. 58. Surfaces represent package communication classes in P1 that do references A3 A4 A2 B4 B4 D1 E1 P4 surface P4 P2 P3 A4 C1 A2 A1 P2 surface A1 B1 C1 D1 A3 B1 P3 surface E1 referenced P1: analyzed package classes P1 blueprint LSE S.Ducasse 50
  59. 59. Principle P2 P3 P4 A2 B2 A3 B3 A4 D1 E1 F1 G1 C1 A1 B1 H1 I1 P1 D1 E1 F1 G1 C1 A1 B1 H1 I1 col col col col col col col col col col col col col col col col col col A1 D1 G1 Internal Internal E1 F1 referenced classes referenced classes references B1 C1 H1 I1 A1 C1 B1 internal references head A1 C1 B1 internal head G1 H1 I1 Package under analysis G1 H1 I1 P1 B3 D1 E1 F1 G1 B3 D1 E1 F1 G1 A3 D1 E1 C1 body A3 D1 E1 C1 references body external references A2 A1 external A2 A1 B2 D1 B2 D1 A4 E1 F1 G1 A4 E1 F1 G1 most—least External most—least internal referencing classes External referenced classes internal referencing classes referenced classes LSE S.Ducasse 51
  60. 60. Example LSE S.Ducasse 52
  61. 61. Symbols contain domain information • What are the concepts used in an application? • How can we use symbolic information? LSE S.Ducasse 53
  62. 62. Looking at the Symbols • Developers use meaningful names, which capture the domain knowledge. LSE S.Ducasse 54
  63. 63. A cluster is a group of documents which use the same terms LSE S.Ducasse 55
  64. 64. Moose has been validated on real life systems Several large, industrial case studies (NDA) Harman-Becker Nokia Daimler Siemens Different implementation languages (C++, Java, Smalltalk, Cobol) We use external C++ parsers Different sizes Moose is used in several research groups LSE S.Ducasse 56
  65. 65. Possible New Research Directions • Remodularization • Clustering analysis • Open and Modular modules • Service Identification in Service Oriented Architecture • Architecture Extraction/Validation • Software Quality • Cost/Bugs prediction • EJB evaluation • Business rules extraction • Model transformation • Test LSE S.Ducasse 57
  66. 66. Evolution/Maintenance is a challenge Understanding and maintaining large and complex applications needs better tools/analyses Moose is a platform for developing new analyses Transfer to tool vendors LSE S.Ducasse 58