Software Evolution

24,482 views

Published on

A guest lecture I gave at the University of Saarland, Germany.
February 2008

Published in: Technology

Software Evolution

  1. 1. Software Evolution Michele Lanza Faculty of Informatics University of Lugano Switzerland
  2. 2. Lugano?
  3. 3. Lugano..
  4. 4. Lugano!
  5. 5. REVEAL Reverse Engineering Visualization Evolution Analysis Lab Michele Lanza Romain Robbes Mircea Lungu Marco D’Ambros Richard Wettel
  6. 6. Contents • Introduction • Key Concepts • Mining Software Repositories • Visualization • Applications • The Evolution Matrix • CodeCity • A Bug’s Life • Undermining Software Evolution • Conclusion
  7. 7. Acknowledgements • 4 slides borrowed from Dr. Tudor Gîrba • 9 slides borrowed from Marco D’Ambros • 11 slides borrowed from Richard Wettel • 12 slides borrowed from Romain Robbes • Thanks to Prof. Zeller for having me here
  8. 8. Key Concepts
  9. 9. What is Software? “a program enables a computer to perform a specific task” “A computer program is a collection of instructions that describe a task, or set of tasks, to be carried out by a computer”
  10. 10. Some Facts About Software • Society increasingly relies on software • ...but it is unreliable and of low quality • Software is regarded like a classical engineering product • ...but it is more complex than any other human artifact • Maintenance is treated as a lowly activity • ...but 75% - 95% of cost is spent on maintenance • Software evolves due to business & technology drivers: systems that do not change are dead • Software evolution is crucial
  11. 11. How Large is Software? Windows XP: > 45 M Lines of code (millions) 40 Windows 2000: 40 M Red Hat 7.1 30 M 30 20 Windows 98: 18 M Unix V7: Windows 95: 15 M 10,000 Red Hat 6.2 Solaris 7: 12 M 17 M 10 Windows NT: 4 M 2 Windows 3.1: 3 M 1990 Linux: 10,000 1995 1998 2000 1992
  12. 12. How Much Software is There? • The total volume of software is estimated at 7’000’000’000 function points (FP) • 1 FP ~ 128 lines of C or 107 lines of COBOL • This means ca. 1 TLOC (1’000’000’000’000 lines) • Printed on paper we can wrap the planet 10 times • In what shape is it? • On average ca. 5 bugs / FP • This means ca. 35’000’000’000 bugs (6 per person)
  13. 13. How Reliable is Software?? • Empirically • 1 error / 20 lines • Saftey-critical Systems • 1 error / 100 lines • Wishful Thinking • 1 error / 1000 lines • The software that flies a Jumbo Jet • 8’000’000 lines: you do the math...
  14. 14. Software is complex...
  15. 15. ...and it evolves! Mozilla: 3 MLOC, more than 1 Milliion changes performed by hundreds of developers over more than 6 years
  16. 16. What is Evolution? “the accumulation of changes through succeeding generations of organisms that results in the emergence of new species”
  17. 17. Maintenance vs Evolution Software Evolution XP sez: a system is 1.0 1.1 1.1a *always* in evolution, 2.0 there is no “for ward t maintenance engineering” phase only maintenance steps activity “Software Maintenance Development”
  18. 18. Why Analyze Software Evolution? “Nevertheless, the industrial track record raises the question, why, despite so many advances, [...] • satisfactory functionality, performance and quality is only achieved over a lengthy evolutionary process • software maintenance never ceases until a system is scrapped • software is still generally regarded as the weakest link in the development of computer-based systems” Lehman et. al, 1997
  19. 19. Software Entropy • Lehman’s “Laws of Software Evolution” • “Continuing Change” • “Increasing Entropy/Complexity” • “Increasing Size” • Maintenance increases “Software Entropy” • Erosion of architecture, design, modularization • Increase if interdependencies between parts (“Coupling”) • Decrease of separation of concerns (“Cohension”)
  20. 20. A “small” example of software entropy
  21. 21. Software Evolution Analysis • Goal: Investigate the evolution of a software system to identify potential shortcomings in its architecture or logical structure • Structural shortcomings can the be subjected to reengineering or restructuring • Prerequisite: Reverse Engineering
  22. 22. Reverse Engineering
  23. 23. Reverse Engineering in Reality • During WW2 in 1944 3 B-29 Bombers had to land in Russia • The main US bomber provided the strategic advantage of reaching over the Pacific • Tremendously valuable, unknown to the Russians, to build from scratch would have taken 5 years • Approach: Disassemble, test, run • One was disassembled, one was used, one was a training model
  24. 24. Software Reverse Engineering • “The process of analysing a subject system to • identify the system’s components and their interrelationships, and • create representations of the system in another form or at a higher level of abstraction” [Chikofsky & Cross, 1990] • Why? To understand other people’s code (newcomers in the team, code reviewing, developers that left, etc.) • Generating UML diagrams is not reverse engineering...but it is a valuable support tool
  25. 25. Development: Hidden Chaos fo rw ar d en gin ee rin g { { { { { { } { { } } actual development } } } { } } }
  26. 26. Reengineering: Regaining control fo g rw rin ar ee d gin en en gin e ee rs rin ve g re { { { { { { } { { } } program transformation } } } { } } }
  27. 27. Creating high level views: reverse engineering g rin ee gin en e rs ve re { { { { } } } } { }
  28. 28. Coming back to Software Evolution Analysis • Software systems are not “just there”, they are evolved over time • “If you want to know who somebody is, you have to ask where he comes from” • Evolution information is the key to a holistic understanding of software • The major goals of software evolution analysis are to • Understand the evolutionary process • Predict the future evolution • This is done by mining software repositories
  29. 29. Mining Software Repositories
  30. 30. Mining Software Repositories? • Software evolution research relies on software repositories (think “CVS” or “Subversion”) • To answer the question “Who did what and when?” • ...but much more than that: Code Effort Bugs Tests ... Changes e-Mails Navigation Web Sites Traces Chats People Specs
  31. 31. Mining: Tools & Models • Tools to create Models • Tools to reason on the created Models Models ... Effort Tests People Documentation Code Traces e-Mails Bugs e-Mails
  32. 32. Models & Meta-Models • A Model is just that: a representation of a system • A Model always relies on a meta-model • One challenge: unify and connect meta-models Package Namespace packagedIn belongsTo * * superclass * Class Inheritance subclass * belongsTo belongsTo Package Namespace * History History * * invokedBy packagedIn belongsTo Invocation Method Attribute * * * candidate superclass * accessedIn accesses Class Inheritance History subclass * History * * belongsTo belongsTo Access * * * invokedBy Invocation Method Attribute History * candidate History History accessedIn accesses * * Access History
  33. 33. ...and Tools again!
  34. 34. Software Evolution Visualization
  35. 35. Software Visualization “The use of the crafts of typography, graphic design, animation, and cinematography with modern human-computer interaction and computer graphics technology to facilitate both the human understanding and effective use of computer software” Stasko et.al., 1998
  36. 36. Static Visualization
  37. 37. Dynamic Visualization
  38. 38. No Silver Bullet Visualization is only a means, not the end
  39. 39. Break
  40. 40. The Evolution Matrix A simple way to visualize evolution
  41. 41. The Evolution Matrix - Principles V1 V2 ... ... Vn-1 Vn NOM NOA age Growth Stagnation
  42. 42. The Evolution Matrix shows Change Idle class Pulsar class Supernova class Persistent Class White dwarf class DayFly
  43. 43. The Evolution Matrix Exemplified
  44. 44. CodeCity A fancy way to look at software
  45. 45. The city metaphor class metric building property software representation number of methods height classes buildings number of attributes width number of attributes length packages districts system city package metric district property nesting level color saturation
  46. 46. ArgoUML City 2’522 classes 143 packages
  47. 47. Topology Azureus 4500+ classes Nesting level • color saturation • reinforced by altitude (stacked platforms)
  48. 48. Scalability? VW Smalltalk 8000+ classes
  49. 49. Interacting in the City
  50. 50. What about Software Evolution? referenceVersion 1 versions * versionEntity ModelHistory history EntityVersion MooseModel referenceVersion referenceHistory containingPackageHistory mooseModel packageHistories packagedIn 1 * 1 versions * versionEntity PackageHistory history EntityVersion FAMIXPackage extendedInPackages * 1 1 1 extendedClasses referenceVersion definedClasses classHistories from here... referenceHistory packagedIn mooseModel containingPackageHistory * * * 1 1 versions * versionEntity ClassHistory history ClassVersion FAMIXClass 1 1 1 1 methodHistories referenceVersion attributeHistories referenceHistory methods attributes mooseModel belongsTo containingClassHistory * * 1 versions * versionEntity MethodHistory history MethodVersion FAMIXMethod referenceVersion referenceHistory mooseModel belongsTo containingClassHistory * to here! * 1 versions * versionEntity AttributeHistory history EntityVersion FAMIXAttribute HISTORY LAYER VERSION LAYER SNAPSHOT LAYER
  51. 51. Same problem, more data System ArgoUML JHotDraw Jmol Packages 144 72 105 Classes 2’542 998 1’032 Lines of code 137’000 30’000 85’000 Sampling start Oct 2002 Oct 2000 Jan 2000 Sampling end Feb 2007 Apr 2005 Aug 2007 Sampling period variable 1 week 8 weeks Samples 9 57 50 Revisions 13’535 267 8’065
  52. 52. ArgoUML, somewhere in time Ver. 0.10.1 Ver. 0.12 Ver. 0.14 ModelFacade 9/10/2002 18/08/2003 5/12/2003 ModelFacade FacadeMDRImpl NSUMLModelFacade Facade Ver. 0.16 NSUMLModelFacade 19/07/2004 Facade Ver. 0.18.1 Ver. 0.20 FacadeMDRImpl 30/04/2005 9/02/2006 Facade Ver. 0.22 Ver. 0.23.4 Ver. 0.24 8/08/2006 10/12/2006 12/02/2007
  53. 53. ArgoUML Age map org.argouml.language.cpp STDCTokenTypes FacadeMDRImpl NOA 152, NOM 0, AGE 4 Facade NOA 3, NOM 351, AGE 4 org.argouml.language.php NOA 1, NOM 339, AGE 5 org.argouml.language.csharp CPPParser NOA 85, NOM 204, AGE 4 org.argouml.language.java org.argouml.model JavaRecognizer NOA 24, NOM 91, AGE 9 JavaTokenTypes NOA 146, NOM 0, AGE 9 org.argouml.uml.reveng.java JavaTokenTypes NOA 175, NOM 0, AGE 9 JavaRecognizer NOA 79, NOM 176, AGE 9
  54. 54. JHotDraw Age map
  55. 55. Jmol, The Time Machine
  56. 56. Tool Support: CodeCity • Implemented in Smalltalk, uses OpenGL for representation, uses a language-independent meta- model (FAMIX)
  57. 57. The EvoSpaces eclipse plugin
  58. 58. A Bug’s Life Treating bugs as first-level entities
  59. 59. An ideal bug’s life cycle Unconfirmed Verified New Resolved Closed Assigned
  60. 60. A less ideal bug’s life cycle Unconfirmed Verified New Resolved Closed Assigned Reopened
  61. 61. A real bug’s life cycle Unconfirmed Verified New Resolved Closed Assigned Reopened
  62. 62. Bug history from activities Bug Bug Problem Problem id description id description product component product component Criticality Activity Criticality severity priority severity priority Involved people Involved people steve assignedTo reporter qa AssignedTo john assignedTo reporter qa State steve john State Status Resolution Status Resolution ... ... Bug history . . . .
  63. 63. Mozilla’s bugs [Sep ‘98 - Apr ‘03] 30% Activities 25% #Bugs 255’302 20% 15% 10% #Activities 2’706’201 5% 0% 0 1-3 4-5 6-10 11-20 21-30 > 30 40% Lifetime (reported - last activity) 32% 24% > 50% 16% 8% 0% 12 Hours 1 Day 1 Week 1 Month 6 Months 1 Year 2 Years More
  64. 64. The System Radiography View “Where (in the system and in its history) are the open bugs located?” Visualization principle • System decomposition on the y axis Component 1 Component 2 Product A • Product :: Component Color y position #bugs Component x position Product B • (x,y) : (time, component) • Time Interval Color: # open bugs Time
  65. 65. Mozilla example [Sep ‘98 - Apr ‘03] aggiungere transizione alla prossima slide, volendo anche nel filmato Browser Mailnews
  66. 66. The Bug Watch View “How are bugs characterized with respect to their history?” Visualization principle End: 10/16/2001 Beginning: 10/19/1999 • 3 Layers Time • Status Status From To Assigned 10/19/99 12/21/99 Resolved 12/21/99 1/31/00 Reopened 1/31/00 2/6/00 New 2/6/00 6/5/00 ... ... ... • Activity • Severity
  67. 67. Examples from Mozilla Browser :: Networking [Nov ‘02- Apr ‘03] tell more about the clustering dire che ne abbiamo trovato anche bugs che passano da resolved a new o unconfirmed senza passare da reopened • Reopened 4 times • Activities: dire cosa e’ la grandezza • • Developer in charge to fix it One statusonly (new) changed 6 times but many activities • Many people added in the CC • All addition of CC • Popular bug
  68. 68. Undermining Software Evolution Did we take the wrong turn?
  69. 69. Versioning Systems suck.. AbstractBar Bar - stuff - other + factory(): ConcreteA Quux ConcreteB Foo ConcreteA - asdf: int - asdf: int Foo + Foo(c: int) + Foo(c: int) + bar(): void + bar(): void
  70. 70. Change as a first-class entity Computing... Event! Answer. Query? Integrated Development SpyWare Change Environment plugin repository
  71. 71. Changes can be composed Atomic Developer-level actions Refactorings →+→+→+→ changes →+→+→ →+→+→+→+→ →+→+→ →+→ →+→+→ →+→+→+→+→ → →+→+→+→+→ →+→+→+→ →+→+→ →+→+→+→ →+→ →+→+→ = = = = = = = = = = = = = = + + + + + + + + + + + = = = Sessionclass Add + Session Extract Method Session + = Change class Add method Add method Change method Change method Rename Method ... Change method(s) Entire system history
  72. 72. Example Application: Understanding Development Sessions
  73. 73. Date Class Change Refactorings 15/08/2006 17h17:29 Added class Foo Number of Entities Additions (Unique Number) method m2 Modifications Method Change Removals Duration
  74. 74. There is a shift of focus among classes ChangePerformerTest, AdditionOperation, MethodNode, ScopeNode, TreeNode, ChangePerformer, DeletionOperation, Method, BlockNode, Argument, Temporary, AdditionOperation, TreeNode, ChangePerformer ScopeNode TreeNode Entity DeletionOperation Removal Let’s look at S , a painting and decoration session
  75. 75. Added arguments and temporaries to the parse tree ScopeNode TreeNode AddEntityChild: Argument isArgument Temporary isTemporary Entity
  76. 76. Added arguments and temporaries to Method. Changed how to handle the children of a method addArgument: Method addTemporary: ChangePerformer allChildren: allLocalVariables:
  77. 77. Changed how source code is generated MethodNode printTempsOn: BlockNode printArgumentsOn: ScopeNode printSourceCodeOn:
  78. 78. Potentially buggy method TreeNode ReorderChildrenWith:
  79. 79. Implementing tests and behavior for a difference algorithm ChangePerformerTest testEventsAndResults ChangePerformer processDeletionOperation: AdditionOperation performOn: DeletionOperation
  80. 80. Focusing on deletion AdditionOperation performOn: DeletionOperation deleteSubtreeAt: TreeNode isCutBranch Removal
  81. 81. Software Animator
  82. 82. Conclusion
  83. 83. Mining Repositories to Control Evolution • “Reverse Engineering with Logical Coupling”, Marco D’Ambros, Michele Lanza, In Proceedings of WCRE 2006 (13th Working Conference on Reverse Engineering), pp. 189 - 198, IEEE CS Press, 2006 • “Software Bugs and Evolution: A Visual Approach to Uncover their Relationship”, Marco D’Ambros, Michele Lanza, In Proceedings of CSMR 2006 (10th European Conference on Software Maintenance and Reengineering), pp. 227 - 236, IEEE CS Press, 2006 • “A Bug’s Life: Visualizing a Bug Database”, Marco D’Ambros, Michele Lanza, Martin Pinzger, In Proceedings of VISSOFT 2007 (4th IEEE International Workshop on Visualizing Software For Understanding and Analysis), pp. 113 - 120, IEEE CS Press, 2007 • “The Evolution Radar: Integrating Fine-grained and Coarse-grained Logical Coupling Information”, Marco D’Ambros, Michele Lanza, Mircea Lungu, In Proceedings of MSR 2006 (3rd International Workshop on Mining Software Repositories), pp. 26 - 32, 2006 • “Fractal Figures: Visualizing Development Effort for CVS Entities”, Marco D'Ambros, Michele Lanza, Harald Gall, In Proceedings of VISSOFT 2005 (3rd IEEE International Workshop on Visualizing Software For Understanding and Analysis), pp. 46 - 51, IEEE CS Press, 2005
  84. 84. Immersive Software Analysis • “Program Comprehension through Software Habitability”, Richard Wettel, Michele Lanza, In Proceedings of ICPC 2007 (15th International Conference on Program Comprehension), pp. 231 - 240, IEEE CS Press, 2007 • “Visualizing Software Systems as Cities”, Richard Wettel, Michele Lanza, In Proceedings of VISSOFT 2007 (4th International Workshop on Visualizing Software for Understanding and Analysis), pp. 92 - 99, IEEE CS Press, 2007
  85. 85. Change-based Software Evolution • “A Change-based Approach to Software Evolution”, Romain Robbes, Michele Lanza, In ENTCS, vol. 166, pp 93 - 109, Jan 2007, Elsevier Science Direct • “Characterizing and Understanding Development Sessions”, Romain Robbes, Michele Lanza, In Proceedings of ICPC 2007 (15th International Conference on Program Comprehension), pp. 155 - 164, IEEE CS Press, 2007 • “An Approach to Software Evolution Based on Semantic Change”, Romain Robbes, Michele Lanza, Mircea Lungu, In Proceedings of FASE 2007 (10th ETAPS Conference on Fundamental Approaches to Software Engineering), pp. 27 - 411, Springer LNCS, 2007 • “Mining a Change-based Repository”, Romain Robbes, In Proceedings of MSR 2007 (4th International Workshop on Mining Software Repositories), IEEE CS Press, 2007 • “Change-based Software Evolution”, Romain Robbes, Michele Lanza, In Proceedings of EVOL 2006 (1st International Workshop on Software Evolution), pp. 159 - 164, 2006 • “Versioning Systems for Evolution Research”, Romain Robbes, Michele Lanza, In Proceedings of IWPSE 2005 (8th International Workshop on Principles of Software Evolution), pp. 155 - 164, IEEE CS Press, 2005
  86. 86. Visual Architecture Reconstruction • “Reverse Engineering Super-Repositories”, Mircea Lungu, Michele Lanza, Tudor Gîrba, Reinout Heeck, In Proceedings of WCRE 2007 (14th Working Conference on Reverse Engineering), to be published, IEEE CS Press, 2007 • “Exploring Inter-Module Relationships in Evolving Software Systems”, Mircea Lungu, Michele Lanza, In Proceedings of CSMR 2007 (11th European Conference on Software Maintenance and Reengineering), pp. 91 - 100, IEEE CS Press, 2007 • “Package Patterns for Visual Architecture Recovery”, Mircea Lungu, Michele Lanza, Tudor Gîrba, In Proceedings of CSMR 2006 (10th European Conference on Software Maintenance and Reengineering), pp. 227 - 236, IEEE CS Press, 2006 • “Interactive Exploration of Semantic Clusters”, Mircea Lungu, Adrian Kuhn, Tudor Gîrba, Michele Lanza, In Proceedings of VISSOFT 2005 (3rd International Workshop on Visualizing Software for Understanding and Analysis), pp. 95 - 100, IEEE CS Press, 2005 • “A Small Observatory for Super-Repositories”, Mircea Lungu, Tudor Gîrba, In Proceedings of IWPSE 2007 (10th International Workshop on Principles of Software Evolution), pp. 106 - 109, IEEE CS Press, 2007 • “Softwarenaut: Cutting the Edge in Software Visualization”, Mircea Lungu, Michele Lanza, In Proceedings of Softvis 2006 (3rd International Symposium on Software Visualization), pp. 179 - 180, ACM Press, 2006
  87. 87. Take-away • Software is not written, it’s being evolved (by people) • Fully understanding software is only possible if one takes into account evolutionary information • Evolution analysis = mining software repositories • Mining software repositories = tools & models • A myriad of approaches exist • Long-term goal: holistic understanding of software • And: assisting the developer • Last but not least: an exciting, still under-researched, field of software engineering
  88. 88. Thank you - Questions?
  89. 89. Michele Lanza http://www.inf.unisi.ch/lanza/ creativecommons.org/licenses/by/3.0/

×