Ducasse's Maintenance Expertise

Uploaded on


More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. LSE A Portfolio of Software Evolution Expertise Stéphane Ducasse stephane.ducasse@inria.fr http://stephane.ducasse.free.fr/ Stéphane Ducasse 1
  • 2. A word of presentation Co-author of Object-Oriented Reengineering Patterns Co-developer of Moose (reengineering platform) 10 PhD Theses in reengineering 50+ articles Grounded in reality Was maintainer of Squeak 3.9 Worked with: Harman-Becker AG Bedag AG Nokia, Daimler LSE S.Ducasse 2
  • 3. Roadmap • Some facts • Our approach • Supporting maintenance • Moose an open-platform • Some visual examples • Conclusion LSE S.Ducasse 3
  • 4. Software is complex. 29% Succeeded 18% Failed 53% Challenged The Standish Group, 2004 LSE S.Ducasse 4
  • 5. How large is your project? LSE S.Ducasse 5
  • 6. How large is your project? LSE S.Ducasse 5
  • 7. How large is your project? LSE S.Ducasse 5
  • 8. How large is your project? LSE S.Ducasse 5
  • 9. How large is your project? 1’000’000 lines of code LSE S.Ducasse 5
  • 10. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds LSE S.Ducasse 5
  • 11. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds / 3600 = 560 hours LSE S.Ducasse 5
  • 12. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds / 3600 = 560 hours / 8 = 70 days LSE S.Ducasse 5
  • 13. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds / 3600 = 560 hours / 8 = 70 days / 20 = 3 months LSE S.Ducasse 5
  • 14. Maintenance is Continuous Development 4.1% Other 18.2% Adaptive (new platforms or OS) Relative Maintenance Effort Between 50% and 75% of global effort is spent on 17.4% Corrective “maintenance” ! (fixing reported errors) 60.3% Perfective (new functionality) The bulk of the maintenance cost is due to new functionality even with better requirements, it is hard to predict new functions LSE S.Ducasse 6
  • 15. Lehman’s Software Evolution Laws Continuous Change: “A program that is used in a real-world environment must change, or become progressively less useful in that environment.” Software Entropy: “As a program evolves, it becomes more complex, and extra resources are needed to preserve and simplify its structure.” LSE S.Ducasse 7
  • 16. Roadmap • Some facts • Our approach • Supporting maintenance • Moose an open-platform • Some visual examples • Conclusion LSE S.Ducasse 8
  • 17. Supporting the evolution of applications A research goal and agenda grounded in reality How to help companies maintaining their large software? What is the xray for software? code, people, practices Which analyses? How can you monitor your system (dashboards....) How to present extracted information? S.Ducasse 9
  • 18. Covered topics Analyses Topics Reverse Engineering Metamodeling, Software metrics, Program understanding, Representation Transformations Visualization, Evolution analysis, Duplicated code detection, Evolution Code Analysis, Refactorings, Tests Contributions Moose: an open-source extensible reengineering environment: (Lugano, Bern, Annecy, Anvers, Louvain la neuve, ULB, UTSL) Contacts Harman-Becker (3 Millions C++), Bedag (Cobol), Nokia, ABB, IMEC S.Ducasse 10
  • 19. Software Metrics [LMO99, OOPSLA00] Duplicated Code Identification Understanding Large Systems [ICSM99, ICSM02] Group Identification [WCRE99, TSI00, TSE03] Static/Dynamic Information [ASE03] Test Generation [ICSM99] Feature Analysis [CSMR 06] Concept Identification [JSME 06] Analyses [WCRE 06] Class Understanding [OOPSLA01,TSE04] Package Blueprints Reverse [ICSM 07] Engineering Distribution Maps [ICSM 06] Representation Transformations Language Independent Refactorings [IWPSE 00] Evolution Language Independent Meta Model (FAMIX) Reengineering Patterns [UML99] Version Analyses An Extensible Reengineering [ICSM 05] Environment (Moose) HISMO metamodel [Models 06] [JSME 05] LSE S.Ducasse 11
  • 20. One Example: who is responsible of what? (4) Visualisation (3) Analyses 2) Modèle (1) Extraction Distribution Map of authors on JBoss S.Ducasse 12
  • 21. Moose is a reengineering tool which integrates multiple techniques Number of classes = 382 Number of methods = 4268 Metrics … Visualization Moose Queries and Navigation word1 word2 … Semantic Analysis Evolution Analysis LSE S.Ducasse 13
  • 22. Moose is open and open-source meta-described meta-model aware Method Class Inheritance LSE S.Ducasse 14
  • 23. Designed to be extensible Class History Duplication Class Author Version Method Class File Event Inheritance Trace LSE S.Ducasse 15
  • 24. Roadmap • Some facts • Our approach • Supporting maintenance • Moose an open-platform • Some visual examples • Conclusion LSE S.Ducasse 16
  • 25. Understanding large systems Understanding code is difficult! Systems are large Code is abstract Should I really convinced you? Some existing approaches Metrics: problems you often get meaningless results once combined Visualization: often beautiful but without meaning LSE S.Ducasse 17
  • 26. Polymetric views W: # fields H: # methods C: # lines of code LSE S.Ducasse 18
  • 27. Polymetric views condense information To get a feel of the inheritance semantics: adding vs. reusing Classes+Inheritance W: # of Added Methods H: # of Overridden Method C: # of Method Extended methods LOC # statements # parameters LSE S.Ducasse 19
  • 28. Navigating Views... LSE S.Ducasse 20
  • 29. Understanding classes Understanding even a class is difficult! LSE S.Ducasse 21
  • 30. Class Blueprint Enriched call flow annotated with metrics to give semantics Initialization External Interface Internal Implementation Accessor Attribute Invocation Sequence LSE S.Ducasse 22
  • 31. Class Blueprint LSE S.Ducasse 23
  • 32. Large delegating interface LSE S.Ducasse 24
  • 33. Sharing Flows LSE S.Ducasse 25
  • 34. Regular Subclasses LSE S.Ducasse 26
  • 35. Patterns LSE S.Ducasse 27
  • 36. How can we predict changes? Common wisdom stresses that what changes yesterday will change today, but it is true? In the Sahara the weather is constant, tomorrow: 90% chance that it is the same as today In Belgium, the weather is changing really fast (sea influence), 30% chance that it is the same as today LSE S.Ducasse 28
  • 37. With history analysis we can get the climate of a software system Past Late Future Early Changers Changers 1, TopLENOM1..i (S, t1) ∩ TopEENOMi..n (S, t2) ≠ ∅ YWi(S) = 0, TopLENOM1..i (S, t1) ∩ TopEENOMi..n (S, t2) = ∅ ∑ YWi(S, t1, t2) YW(S, t1, t2) = Past Present Future n-2 hit versions version versions LSE S.Ducasse 29
  • 38. How developers develop? • More efficient to put people working together in the same office? • How can we optimize software development? LSE S.Ducasse 30
  • 39. Who did that? Files Time LSE S.Ducasse 31
  • 40. Line colors show which author owned which files in which period Green author Green author large commit ownership File A File B Blue author small commit LSE S.Ducasse 32
  • 41. Which author “possesses” which files? LSE S.Ducasse 33
  • 42. Alphabetical order is no order! LSE S.Ducasse 34
  • 43. Based on similar commit signature Edit Takeover Monologue Familiarization Dialogue LSE S.Ducasse 35
  • 44. Understanding evolution of large systems • How old are the hierarchies? • How did the classes change? • How did the inheritance change? LSE S.Ducasse 36
  • 45. Evolution holds useful information A A A A A BC BC BC B D D D time A is persistent C was removed B is stable E is newborn D inherited from C and then from A … LSE S.Ducasse 37
  • 46. Hierarchy Evolution Complexity View characterizes class hierarchy histories ENOM A Age ENOS Class History Removed C B Age Inheritance History E D Removed A is persistent C was removed B is stable E is newborn D inherited from C and then from A … LSE S.Ducasse 38
  • 47. Class hierarchies over 40 versions of Jun - a 740 classes, 3D framework LSE S.Ducasse 39
  • 48. Identifying Duplicated Code “Parsing the program suite of interest requires a parser for the language dialect of interest. While this is nominally an easy task, in practice one must acquire a tested grammar for the dialect of the language at hand. Often for legacy codes, the dialect is unique and the developing organization will need to build their own parser. Worse, legacy systems often have a number of languages and a parser is needed for each. Standard tools such as Lex and Yacc are rather a disappointment for this purpose, as they deal poorly with lexical hiccups and language ambiguities.” [Baxter 98] Problems Unknown Duplicated Code Scalability Understanding LSE S.Ducasse 40
  • 49. Language Independent a b c defa b cdef Language independent, Textual, [ICSM’99], M. Rieger’s PhD. Thesis Duploc handled Exact Copies Pascal, Java, Smalltalk, Python, a b c d e fa b x y e f Cobol, C++, PDP-11, C Slower than other approaches but... Max 45 min to adapt our approach to a new language Between 3% and 10% Copies with less identification than parametrized match LSE S.Ducasse 41
  • 50. A Conceptual Matrix File A File B a b c defa b cdef File A Exact Copies a b c d e fa b x y e f File B Copies with Variations 42 LSE S.Ducasse
  • 51. Entities that change together can reveal hidden dependencies (A,B,C,D,E) () A 2 3 3 3 4 6 (A,B,C,D) (A,D,E) (v6) (v2) B 6 6 6 5 6 7 (A,B,C) (D,E) (A,D) C 3 3 5 5 8 9 (v5,v6) (v2,v4) (v2,v6) D 1 3 3 4 4 6 (D) (C) (A) (v2,v4,v6) (v3,v5,v6) (v2,v5,v6) E 4 5 5 6 6 6 v1 v2 v3 v4 v5 v6 () (v1,v2,v3,v4,v5,v6) LSE S.Ducasse 43
  • 52. How properties spread in large systems? Properties: Metrics People Symbol/Concepts Spread = how many packages does it touch? Focus = do packages and properties match? Distribution Map: a generic visualization LSE S.Ducasse 44
  • 53. Distribution Map LSE S.Ducasse 45
  • 54. Ownership • Authors in JBoss LSE S.Ducasse 46
  • 55. Characterizing Packages Butterflies [Metrics05] Kind of Radar LSE S.Ducasse 47
  • 56. Relative version LSE S.Ducasse 48
  • 57. How to understand Packages Packages are key structuring elements But complex: import classes.... Package Blueprints [ICSM 2007] LSE S.Ducasse 49
  • 58. Surfaces represent package communication classes in P1 that do references A3 A4 A2 B4 B4 D1 E1 P4 surface P4 P2 P3 A4 C1 A2 A1 P2 surface A1 B1 C1 D1 A3 B1 P3 surface E1 referenced P1: analyzed package classes P1 blueprint LSE S.Ducasse 50
  • 59. Principle P2 P3 P4 A2 B2 A3 B3 A4 D1 E1 F1 G1 C1 A1 B1 H1 I1 P1 D1 E1 F1 G1 C1 A1 B1 H1 I1 col col col col col col col col col col col col col col col col col col A1 D1 G1 Internal Internal E1 F1 referenced classes referenced classes references B1 C1 H1 I1 A1 C1 B1 internal references head A1 C1 B1 internal head G1 H1 I1 Package under analysis G1 H1 I1 P1 B3 D1 E1 F1 G1 B3 D1 E1 F1 G1 A3 D1 E1 C1 body A3 D1 E1 C1 references body external references A2 A1 external A2 A1 B2 D1 B2 D1 A4 E1 F1 G1 A4 E1 F1 G1 most—least External most—least internal referencing classes External referenced classes internal referencing classes referenced classes LSE S.Ducasse 51
  • 60. Example LSE S.Ducasse 52
  • 61. Symbols contain domain information • What are the concepts used in an application? • How can we use symbolic information? LSE S.Ducasse 53
  • 62. Looking at the Symbols • Developers use meaningful names, which capture the domain knowledge. LSE S.Ducasse 54
  • 63. A cluster is a group of documents which use the same terms LSE S.Ducasse 55
  • 64. Moose has been validated on real life systems Several large, industrial case studies (NDA) Harman-Becker Nokia Daimler Siemens Different implementation languages (C++, Java, Smalltalk, Cobol) We use external C++ parsers Different sizes Moose is used in several research groups LSE S.Ducasse 56
  • 65. Possible New Research Directions • Remodularization • Clustering analysis • Open and Modular modules • Service Identification in Service Oriented Architecture • Architecture Extraction/Validation • Software Quality • Cost/Bugs prediction • EJB evaluation • Business rules extraction • Model transformation • Test LSE S.Ducasse 57
  • 66. Evolution/Maintenance is a challenge Understanding and maintaining large and complex applications needs better tools/analyses Moose is a platform for developing new analyses Transfer to tool vendors LSE S.Ducasse 58