Moose Tutorial at WCRE 2008
Upcoming SlideShare
Loading in...5
×
 

Moose Tutorial at WCRE 2008

on

  • 4,266 views

I used this set of slides for the Moose tutorial at WCRE 2008

I used this set of slides for the Moose tutorial at WCRE 2008

Statistics

Views

Total Views
4,266
Views on SlideShare
3,757
Embed Views
509

Actions

Likes
4
Downloads
53
Comments
0

5 Embeds 509

http://www.tudorgirba.com 351
http://www.moosetechnology.org 102
http://moose.unibe.ch 50
http://www.slideshare.net 4
http://tudorgirba.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Moose Tutorial at WCRE 2008 Moose Tutorial at WCRE 2008 Presentation Transcript

  • Moose Tutorial Tudor Gîrba www.tudorgirba.com
  • fo rw g rin ar ee d in en ng gi ne ee er rs in ve g re { { { { { { } { { } } actual development } } } { } } }
  • built in Berne
  • built in Berne
  • used in several research groups > 100 men years of effort ~ 150 publications since 1997
  • is an analysis tool is a modeling platform is a visualization platform is a tool building platform is a collaboration
  • is an analysis tool is a modeling platform is a visualization platform is a tool building platform is a collaboration
  • las s McCabe = 21 odC t: #isG ss elec NOM = 0 sse 102 ,00 cla ... 3 75 = L OC { { { { } } } } { }
  • Metrics compress the system into numbers NOM NOC DUPLINES LOC NOCmts NAI TCC NOPA NOA WMC WLOC NI CYCLO WNOC ... ATFD WOC HNL MSG
  • Detection Strategies are metric-based queries to detect design flaws Rule 1 METRIC 1 > Threshold 1 AND Quality problem Rule 2 METRIC 2 < Threshold 2 00 6 , Mari nescu 2 Lanza
  • Example: a God Class centralizes too much intelligence in the system Class uses directly more than a few attributes of other classes ATFD > FEW Functional complexity of the class is very high AND GodClass WMC ! VERY HIGH Class cohesion is low TCC < ONE THIRD 00 6 , Mari nescu 2 Lanza
  • Polymetric views show up to 5 metrics Width metric Height metric Position metrics Color metric 003 Lanza 2
  • System Complexity shows class hierarchies 03 ucasse 20 Lanza, D
  • Class Blueprint shows class internals 2005 e , Lanza Ducass
  • Package Blueprint shows package usage eetal 2007 Ducass
  • Distribution Map shows properties over structure eetal 2006 Ducass
  • Semantic Clustering reveals implementation topics user, run, load, message, file, buffer, util property, AWT, edit, show, update, sp, set start, buffer, end, text, length, line, count action, box, component, event, button, layout, GUI start, length, integer, end, number, pre, count XML, dispatch, microstar, reader, XE, register, receive current, buffer, idx, review, archive, endr, TAR BSH, simple, invocation, assign, untype, general, arbitrary maximum, label, link, item, code, put, vector Kuhn e tal 2006
  • Software Map gives software space a meaning Kuhn e tal 2008
  • Softwarenaut explores the package structure 6 Lungu etal 200
  • CodeCity shows where your code lives 7 La nza 200 Wettel,
  • Trace Signals reveal similar execution traces 06 r eevy 20 Kuhn, G
  • Feature Views show how features cover classes 6 etal 200 addFolder addPage Greevy
  • Feature Map shows relates features to code 7 etal 200 Greevy
  • Object Flow captures object aliases 009 Lie nhard 2
  • Object Flow captures object aliases 009 Lie nhard 2
  • Object Flow shows how objects move 7 nhard etal 200 Lie
  • Object Dependencies reveal features dependencies Open Join Channel Connect Send Message 7 nhard etal 200 Lie
  • Hierarchy Evolution reveals evolution patterns 5 Girba etal 200
  • Evolution Radar shows co-change relationships n za 2006 D’Am bros, La
  • Ownership Map reveals patterns in CVS 6 Girba etal 200
  • Kumpel shows how developers work on files 8 Junker 200
  • Clone Evolution shows who copied from whom 6 Balint etal 200
  • las s McCabe = 21 odC t: #isG ss elec NOM = 0 sse 102 ,00 cla ... 3 75 = L OC { { { { } } } } { }
  • is an analysis tool is a modeling platform is a visualization platform is a tool building platform is a collaboration
  • is an analysis tool is a modeling platform is a visualization platform is a tool building platform is a collaboration
  • FAMIX is a language independent meta-model Package Namespace packagedIn belongsTo * * superclass * Class Inheritance subclass * belongsTo belongsTo * * * invokedBy Invocation Method Attribute * candidate accessedIn accesses * * Access
  • System Version Class Version
  • System Version Class Class History Version
  • System System History Version Class Class History Version
  • System System History Version Class Class History Version
  • Hismo is the history meta-model History Version History Version 5 G îrba 200
  • Hismo is the history meta-model History Version History Version History Version 5 G îrba 200
  • What changed? When did it change? ... 2 4 3 5 7 2 2 3 4 9 2 2 1 2 3 2 2 2 2 2 1 5 3 4 4
  • Evolution of Number of Methods LENOM(C) = ∑ |NOMi(C)-NOMi-1(C)| 2i-n LENOM(C) = 4 + 2 + 1 + 0 = 7 1 5 3 4 4 4 Gîrba etal 200
  • Latest Evolution of Number of Methods LENOM(C) = ∑ |NOMi(C)-NOMi-1(C)| 2i-n Earliest Evolution of Number of Methods EENOM(C) = ∑ |NOMi(C)-NOMi-1(C)| 22-i -3 -2 -1 0 LENOM(C) = 42 + 22 + 12 + 02 = 1.5 1 5 3 4 4 EENOM(C) = 4 20 + 2 2-1 + 1 2-2 + 0 2-3 = 5.25 4 Gîrba etal 200
  • ENOM LENOM EENOM 2 4 3 5 7 7 3.5 3.25 2 2 3 4 9 7 5.75 1.37 2 2 1 2 3 3 1 2 2 2 2 2 2 0 0 0 1 5 3 4 4 7 1.25 5.25 4 Gîrba etal 200
  • ENOM LENOM EENOM balanced changer 7 3.5 3.25 late changer 7 5.75 1.37 3 1 2 dead stable 0 0 0 early changer 7 1.25 5.25 4 Gîrba etal 200
  • FAMIX ... Class Method
  • Dynamix ... Instance Activation FAMIX ... Class Method
  • Dynamix ObjectFlow ... Instance Activation ... Alias FAMIX ... Class Method
  • Dynamix ObjectFlow ... Instance Activation ... Alias FAMIX ... Class Method Hismo Class Method ... History History
  • Dynamix ObjectFlow ... Instance Activation ... Alias FAMIX ... Class Method Subversion Hismo File File Class Method ... ... History Version History History
  • Dynamix ObjectFlow ... Instance Activation ... Alias CVS FAMIX File File ... ... Class Method History Version Subversion Hismo File File Class Method ... ... History Version History History
  • BugsLife Dynamix ObjectFlow ... Bug Activity ... Instance Activation ... Alias CVS FAMIX File File ... ... Class Method History Version Subversion Hismo File File Class Method ... ... History Version History History
  • BugsLife Dynamix ObjectFlow ... Bug Activity ... Instance Activation ... Alias CVS FAMIX Dude File File ... ... Class Method ... Duplication History Version Subversion Hismo File File Class Method ... ... History Version History History
  • BugsLife Dynamix ObjectFlow ... Bug Activity ... Instance Activation ... Alias CVS FAMIX Dude File File ... ... Class Method ... Duplication History Version Subversion Hismo ... File File Class Method ... ... ... History Version History History
  • BugsLife Dynamix ObjectFlow ... Bug Activity ... Instance Activation ... Alias CVS FAMIX Core Dude File File ... ... Class Method ... Duplication History Version Subversion Hismo ... File File Class Method ... ... ... History Version History History
  • BugsLife Dynamix ObjectFlow ... Instance Activation -m eta o...dels Alias y of m ... Bug Activity is a famil FA MIX CVS FAMIX Core Dude File File ... ... Class Method ... Duplication History Version Subversion Hismo ... File File Class Method ... ... ... History Version History History
  • FM3 is the meta-meta-model FM3.Element name: String fullName: String superclass opposite FM3.Package FM3.Class FM3.Property derived: Boolean keyed: Boolean type multivalued: Boolean extensions 20 08 erwaest Kuhn, V
  • MSE is the exchange format (FAMIX.Class (id: 100) (name 'Server') (container (ref: 82)) (isAbstract false) (isInterface false) (package (ref: 624)) (stub false) (NOM 9) (WLOC 124)) (FAMIX.Method (id: 101) (name 'accept') (signature 'accept(Visitor v)') (parentClass (ref: 100)) (accessControl 'public') (hasClassScope false) (stub false) (LOC 7) (CYCLO 3)) 2008 erwaest Kuhn, V
  • is an analysis tool is a modeling platform is a visualization platform is a tool building platform is a collaboration
  • is an analysis tool is a modeling platform is a visualization platform is a tool building platform is a collaboration
  • Mondrian scripts graph visualizations view nodes: classes forEach: [ :each | view nodes: each methods. view gridLayout ]. view edgesFrom: #superclass. view treeLayout. 6 Meyer etal 200
  • EyeSee scripts charts 07 r, Hofs tetter 20 Junke
  • CodeCity scripts 3D visualizations 008 Wettel 2
  • is an analysis tool is a modeling platform is a visualization platform is a tool building platform is a collaboration
  • is an analysis tool is a modeling platform is a visualization platform is a tool building platform is a collaboration
  • Repository FAMIX Fame UI Mondrian EyeSee
  • Repository FAMIX Fame UI Mondrian EyeSee MSE Smalltalk
  • Repository FAMIX Fame UI Mondrian EyeSee Java iPlasma MSE Smalltalk C++
  • Repository FAMIX Fame UI Mondrian EyeSee Java iPlasma MSE Smalltalk C++
  • Yellow Chronia CodeCity DynaMoose Hapax SmallDude Submarine Repository FAMIX Fame UI Mondrian EyeSee Java iPlasma MSE Smalltalk C++
  • CVS J-Wiretap MSE Source SVN Yellow Chronia CodeCity DynaMoose Hapax SmallDude Submarine Repository FAMIX Fame UI Mondrian EyeSee Java iPlasma MSE Smalltalk C++
  • CVS J-Wiretap MSE Source SVN Softwarenaut BugsLife Clustering Metanool ... Yellow Chronia CodeCity DynaMoose Hapax SmallDude Submarine Repository FAMIX Fame UI Mondrian EyeSee Java iPlasma MSE Smalltalk C++
  • CVS J-Wiretap MSE Source SVN Softwarenaut BugsLife Clustering Metanool ... Yellow Chronia CodeCity DynaMoose Hapax SmallDude Submarine Repository FAMIX Fame UI Mondrian EyeSee Java iPlasma MSE Smalltalk C++
  • Model
  • Model GUI
  • Model Helpers GUI
  • Murphy etal 1995 Model Helpers GUI
  • Model Helpers
  • 8 lmann etal 200 Brüh
  • 8 lmann etal 200 Brüh
  • CVS J-Wiretap MSE Source SVN Softwarenaut BugsLife Clustering Metanool ... Yellow Chronia CodeCity DynaMoose Hapax SmallDude Submarine Repository FAMIX Fame UI Mondrian EyeSee Java iPlasma MSE Smalltalk C++
  • is an analysis tool is a modeling platform is a visualization platform is a tool building platform is a collaboration
  • is an analysis tool is a modeling platform is a visualization platform is a tool building platform is a collaboration
  • Université Catholique de Louvain INRIA Lille Politehnica University of Timisoara University of Berne University of Lugano
  • Current Team Previous Team Stéphane Ducasse Serge Demeyer Tudor Gîrba Adrian Kuhn Michele Lanza Sander Tichelaar Current Contributors Previous Contributors Hani Abdeen Jannik Laval Tobias Aebi Michael Meer Ilham Alloui Michael Meyer Philipp Bunge Adrian Lienhard Gabriela Arevalo Laura Ponisio Alexandre Bergel Mircea Lungu Mihai Balint Daniel Ratiu Johan Brichau Oscar Nierstrasz Frank Buchli Matthias Rieger Thomas Bühler Azadeh Razavizadeh Marco D’Ambros Damien Pollet Calogero Butera Andreas Schlapbach Simon Denier Jorge Ressia Daniel Frey Daniel Schweizer Georges Golomingi Mauricio Seeberger Orla Greevy Toon Verwaest David Gurtner Lukas Steiger Matthias Junker Richard Wettel Reinout Heeck Daniele Talerico Markus Hofstetter Herve Verjus Markus Kobel Violeta Voinescu Michael Locher Sara Sellos Martin von Löwis Lucas Streit Pietro Malorgio Roel Wuyts
  • Current Team Previous Team Stéphane Ducasse Serge Demeyer Tudor Gîrba Adrian Kuhn Michele Lanza Sander Tichelaar Current Contributors m y ears Contributors en Previous Hani Abdeen > 100 Jannik Laval Tobias Aebi Michael Meer Ilham Alloui Michael Meyer Philipp Bunge Adrian Lienhard Gabriela Arevalo Laura Ponisio Alexandre Bergel Mircea Lungu Mihai Balint Daniel Ratiu Johan Brichau Oscar Nierstrasz Frank Buchli Matthias Rieger Thomas Bühler Azadeh Razavizadeh Marco D’Ambros Damien Pollet Calogero Butera Andreas Schlapbach Simon Denier Jorge Ressia Daniel Frey Daniel Schweizer Georges Golomingi Mauricio Seeberger Orla Greevy Toon Verwaest David Gurtner Lukas Steiger Matthias Junker Richard Wettel Reinout Heeck Daniele Talerico Markus Hofstetter Herve Verjus Markus Kobel Violeta Voinescu Michael Locher Sara Sellos Martin von Löwis Lucas Streit Pietro Malorgio Roel Wuyts
  • is an analysis tool is a modeling platform is a visualization platform is a tool building platform is a collaboration
  • is an analysis tool is a modeling platform is a visualization platform is a tool building platform idea is a collaboration is an
  • is an analysis tool is a modeling platform is a visualization platform is a tool building platform idea is a collaboration is an
  • Scripting Visualizations with Mondrian Semantic Clustering: Identifying Topics in Test Blueprints — Exposing Side Effects in Source Code Execution Traces to Support Writing Unit Tests Michael Meyer and Tudor Gˆrba ı Software Composition Group, University of Berne, Switzerland Adrian Lienhard*, Tudor Gˆrba, Orla Greevy and Oscar Nierstrasz ı Adrian Kuhn a,1 St´phane Ducasse b,2 Tudor Gˆ a,1 e ırba Software Composition Group, University of Bern, Switzerland {lienhard, girba, greevy, oscar}@iam.unibe.ch Abstract 2 Mondrian by example a Software Composition Group, University of Berne, Switzerland b Language and Software Evolution Group, LISTIC, Universit´ de Savoie, France e Most visualization tools focus on a finite set of dedicated In this section we give a simple step-by-step example of Abstract Implementing a fixture and all the relevant assertions re- visualizations that are adjustable via a user-interface. In how to script a visualization using Mondrian. The example quired can be challenging if the code is the only source of this demo, we present Mondrian, a new visualization engine builds on a small model of a source code with 32 classes. Writing unit tests for legacy systems is a key maintenance information. One reason is that the gap between static struc- designed to minimize the time-to-solution. We achieve this The task we propose is to provide a on overview of the hi- task. When writing tests for object-oriented programs, ob- ture and runtime behavior is particularly large with object- by working directly on the underlying data, by making nest- erarchies. Abstract jects need to be set up and the expected effects of executing oriented programs. Side effects1 make program behavior ing an integral part of the model and by defining a powerful Creating a view and adding nodes. Suppose we can the unit under test need to be verified. If developers lack more difficult to predict. Often, encapsulation and complex scripting language that can be used to define visualizations. ask the model object for the classes. We can add those Many of the existing approaches in Software Comprehension focus on program pro- internal knowledge of a system, the task of writing tests is chains of method executions hide where side effects are pro- We support exploring data in an interactive way by provid- classes to a newly created view by creating a node for each gram structure or external documentation. However, by analyzing formal informa- non-trivial. To address this problem, we propose an ap- duced [2]. Developers usually resort to using debuggers to ing hooks for various events. Users can register actions for class, where each node is represented as a Rectangle. In the proach that exposes side effects detected in example runs of obtain detailed information about the side effects, but this tion the informal semantics contained in the vocabulary of source code are over- these events in the visualization script. case above, NOA, NOM and LOC are methods in the object the system and uses these side effects to guide the developer implies low level manual analysis that is tedious and time looked. To understand software as a whole, we need to enrich software analysis with representing a class and return the value of the correspond- when writing tests. We introduce a visualization called Test consuming [25]. ing metric. the developer knowledge hidden in the code naming. This paper proposes the use Blueprint, through which we identify what the required fix- Thus, the underlying research question of the work we of information retrieval to exploit linguistic information found in source code, such ture is and what assertions are needed to verify the correct present in this paper is: how can we support developers view := ViewRenderer new. 1 Introduction view newShape rectangle; as identifier names and comments. We introduce Semantic Clustering, a technique behavior of a unit under test. The dynamic analysis tech- faced with the task of writing unit tests for unfamiliar legacy width: #NOA; height: #NOM; linearColor: #LOC within: model classes; withBorder. based on Latent Semantic Indexing and clustering to group source artifacts that use nique that underlies our approach is based on both tracing code? The approach we propose is based on analyzing run- view nodes: model classes. similar vocabulary. We call these groups semantic clusters and we interpret them method executions and on tracking the flow of objects at time executions of a program. Parts of a program execu- view open. Visualization is an established tool to reason about data. as linguistic topics that reveal the intention of the code. We compare the topics runtime. To demonstrate the usefulness of our approach we tion, selected by the developer, serve as examples for new to each other, identify links between them, provide automatically retrieved labels, present results from two case studies. unit tests. Rather than manually stepping through the ex- Given a wanted visualization, we can typically find tools and use a visualization to illustrate how they are distributed over the system. Our ecution with a debugger, we perform dynamic analysis to that take as input a certain format and that provide the Keywords: Dynamic Analysis, Object Flow Analysis, Adding edges and layouting. To show how classes in- derive information to support the task of writing tests with- needed visualization [4]. herit from each other, we can add an edge for each inheri- approach is language independent as it works at the level of identifier names. To Software Maintenance, Unit Testing out requiring a detailed understanding of the source code. One drawback of the approach is that, when a deep rea- tance relationship. In our example, supposing that we can validate our approach we applied it on several case studies, two of which we present In our experimental tool, we present a visual represen- soning is required, we need to refer back to the capabili- ask the model for all the inheritance objects, and given an in this paper. 1 Introduction tation of the dynamic information in a diagram similar to ties of the original tool that manipulates the original data. inheritance object, we will create an edge between the node the UML object diagram [11]. We call this diagram a Test Note: Some of the visualizations presented make heavy use of colors. Please obtain Creating automated tests for legacy systems is a key Another drawback is that it actually duplicates the required holding the superclass and the node holding the subclass. maintenance task [9]. Tests are used to assess if legacy be- Blueprint as it serves as a plan for implementing a test. It resources unnecessarily: the data is present both in the orig- a color copy of the article for better understanding. We layout the nodes in a tree. havior has been preserved after performing modifications or reveals the minimal required fixture and the side effects that inal tool, and in the visualization tool. Several tools take a view := ViewRenderer new. extensions to the code. Unit testing (i.e., tests based on the are produced during the execution of a particular program middle ground approach and choose to work close with the view newShape rectangle; Key words: reverse engineering, clustering, latent semantic indexing, visualization XUnit frameworks [1]) is an established and widely used unit. Thus, the Test Blueprint reveals the exact information width: #NOA; height: #NOM; linearColor: #LOC within: model classes; data by either offering integration with other services [1], testing technique. It is now generally recognized as an es- that should be verified with a corresponding test. withBorder. PACS: or providing the services themselves [2]. However, when view nodes: model classes. sential phase in the software development life cycle to en- To generate a Test Blueprint, we need to accurately an- view edges: model inheritances another type of service is required, the integration is lost. from: #superclass sure software quality, as it can lead to early detection of alyze object usage, object reference transfers, and the side We present Mondrian, a visualization engine that imple- to: #subclass. defects, even if they are subtle and well hidden [2]. effects that are produced as a result of a program execution. view treeLayout. ments a radically different approach. Instead of provid- view open. The task of writing a unit test involves (i) choosing an To do so, we perform a dynamic Object Flow Analysis in ing a required data format, we provide a simple interface Email addresses: akuhn@iam.unibe.ch (Adrian Kuhn), sduca@unv-savoie.fr appropriate program unit, (ii) creating a fixture, (iii) execut- conjunction with conventional execution tracing [17]. through which the programmer can easily script the visu- Nesting. To obtain more details for the classes, we e (St´phane Ducasse), girba@iam.unibe.ch (Tudor Gˆ ırba). ing the unit under test within the context of the fixture, and Object Flow Analysis is a novel dynamic analysis which alization in a declarative fashion (more information can be would like to see which are the methods inside. To nest we 1 We gratefully acknowledge the financial support of the Swiss National Science (iv) verifying the expected behavior of the unit using asser- tracks the transfer of object references in a program execu- found in [3]). That is, our solution works directly with the specify for each node the view that goes inside. Supposing Foundation for the project “Recast: Evolution of Object-Oriented Applications” tions [1]. All these actions require detailed knowledge of tion. In previous work, we demonstrated how we success- objects in the data model, and instead of duplicating the ob- that we can ask each class in the model about its methods, (SNF 2000-061655.00/1) the system. Therefore, the task of writing unit tests may 1 We refer to side effect as the program state modifications produced by jects by model transformation, we transform the messages we can add those methods to the class by specifying the 2 We gratefully acknowledge the financial support of the french ANR for the project prove difficult as developers are often faced with unfamiliar a behavior. We consider the term program state to be limited to the scope sent to the original objects via meta-model transformations. view for each class. legacy systems. of the application under analysis (i.e., excluding socket or display updates). “Cook: R´architecturisation des applications ` objets” e a 1 Preprint submitted to Elsevier Science 11 October 2006 The Story of Moose: an Agile Reengineering Environment Practical Object-Oriented Back-in-Time Debugging Oscar Nierstrasz Stephane Ducasse ´ Tudor Gˆrba ı Software Composition Group Software Composition Group Software Composition Group Enriching Reverse Engineering with University of Berne University of Berne University of Berne Switzerland Switzerland Switzerland Annotations Adrian Lienhard, Tudor Gˆrba and Oscar Nierstrasz ı www.iam.unibe.ch/∼scg Software Composition Group, University of Bern, Switzerland Andrea Br¨ hlmann, Tudor Gˆ u ırba, Orla Greevy, Oscar Nierstrasz ABSTRACT Software Composition Group, University of Bern, Switzerland http://scg.unibe.ch/ Abstract. Back-in-time debuggers are extremely useful tools for identifying the Moose is a language-independent environment for reverse- Requirements causes of bugs, as they allow us to inspect the past states of objects that are no and re-engineering complex software systems. Moose pro- longer present in the current execution stack. Unfortunately the “omniscient” ap- vides a set of services including a common meta-model, met- xxx problem assessment Xxx proaches that try to remember all previous states are impractical because they rics evaluation and visualization, a model repository, and a z generic GUI support for querying, browsing and grouping. Designs Abstract. Much of the knowledge about software systems is implicit, either consume too much space or they are far too slow. Several approaches rely yyy z and therefore difficult to recover by purely automated techniques. Archi- The development effort invested in Moose has paid off in Yyy on heuristics to limit these penalties, but they ultimately end up throwing out model capture and analysis tectural layers and the externally visible features of software systems are precisely those research activities that benefit from applying too much relevant information. In this paper we propose a practical approach a combination of complementary techniques. We describe two examples of information that can be difficult to detect from source to back-in-time debugging that attempts to keep track of only the relevant past Code migration how Moose has evolved over the years, we draw a number code alone, and that would benefit from additional human knowledge. data. In contrast to other approaches, we keep object history information together of lessons learned from our experience, and we outline the Typical approaches to reasoning about data involve encoding an explicit with the regular objects in the application memory. Although seemingly counter- present and future of Moose. meta-model and expressing analyses at that level. Due to its informal na- intuitive, this approach has the effect that past data that is not reachable from cur- Figure 1: The Reengineering life cycle. ture, however, human knowledge can be difficult to characterize up-front rent application objects (and hence, no longer relevant) is automatically garbage Categories and Subject Descriptors and integrate into such a meta-model. We propose a generic, annotation- collected. In this paper we describe the technical details of our approach, and D.2.7 [Software Engineering]: Maintenance—Restructur- based approach to capture such knowledge during the reverse engineering we present benchmarks that demonstrate that memory consumption stays within ing, reverse engineering, and reengineering process. Annotation types can be iteratively defined, refined and trans- practical bounds. Furthermore since our approach works at the virtual machine reengineer. In addition to the code base, there may be doc- formed, without requiring a fixed meta-model to be defined in advance. level, the performance penalty is significantly less than with other approaches. General Terms umentation (though often out of sync with the code), bug We show how our approach supports reverse engineering by implement- reports, tests and test data, database schemas, and espe- Measurement, Design, Experimentation cially the version history of the code base. Other important ing it in a tool called Metanool and by applying it to (i) analyzing archi- tectural layering, (ii) tracking reengineering tasks, (iii) detecting design 1 Introduction Keywords sources of information include the various stakeholders (i.e., flaws, and (iv) analyzing features. users, developers, maintainers, etc.), and the running system Reverse engineering, Reengineering, Metrics, Visualization itself. The reengineer will neither rely on a single source of When debugging object-oriented systems, the hardest task is to find the actual root information, nor on a single technique for extracting and cause of the failure as this can be far from where the bug actually manifests itself [1]. analyzing that information [11]. In a recent study, Liblit et al. examined bug symptoms for various programs and found 1. INTRODUCTION Reengineering is a complex task, and it usually involves 1 Introduction Software systems need to evolve continuously to be effec- that in 50% of the cases the execution stack contains essentially no information about several techniques. The more data we have at hand, the tive [41]. As systems evolve, their structure decays, unless more techniques we require to apply to understand this data. Most reverse engineering techniques focus on automatically extracting infor- the bug’s cause [2]. effort is undertaken to reengineer them [41, 44, 23, 11]. These techniques range from data mining, to data presen- mation from the source code without taking external human knowledge into Classical debuggers are not always up to the task, since they only provide access to The reengineering process comprises various activities, in- tation and to data manipulation. Different techniques are consideration. More often than not however, important external information is information that is still in the run-time stack. In particular, the information needed to cluding model capture and analysis (i.e., reverse engineer- implemented in different tools, by different people. An in- ing), assessment of problems to be repaired, and migration available (e.g., developer knowledge or domain specific knowledge) which would track down these difficult bugs includes (1) how an object reference got here, and (2) frastructure is needed for integrating all these tools. from the legacy software towards the reengineered system. greatly enhance analyses if it could be taken into account. the previous values of an object’s fields. For this reason it is helpful to have previous ob- Moose is a reengineering environment that offers a com- Although in practice this is an ongoing and iterative process, mon infrastructure for various reverse- and re-engineering Only few reverse engineering approaches integrate such external human knowl- ject states and object reference flow information at hand during debugging. Techniques we can idealize it (see Figure 1) as a transformation through tools [22]. At the core of Moose is a common meta-model edge into the analysis. For example, reflexion models have been proposed for ar- and tools like back-in-time debuggers, which allow one to inspect previous program various abstraction layers from legacy code towards a new for representing software systems in a language-independent chitecture recovery by capturing developer knowledge and then manually map- states and step backwards in the control flow, have gained increasing attention recently system [11, 13, 35]. way. Around this core are provided various services that What may not be clear from this very simplified picture is ping this knowledge to the source code [1,2]. Another example is provided by [3,4,5,6]. are available to the different tools. These services include that various kinds of documents are available to the software metrics evaluation and visualization, a repository for storing Intensional Views which make use of rules that encode external constraints and The ideal support for a back-in-time debugger is provided by an omniscient imple- multiple models, a meta-meta model for tailoring the Moose are checked against the actual source code [3]. mentation that remembers the complete object history, but such solutions are imprac- tical because they generate enormous amounts of information. Storing the data to disk Permission to make digital or hard copies of all or part of this work for meta-model, and a generic GUI for browsing, querying and In this paper we propose a generic framework based on annotations to en- personal or classroom use is granted without fee provided that copies are grouping. instead of keeping it in memory can alleviate the problem, but it only postpones the Moose has been developed over nearly ten years, and has hance a reverse engineered model with external knowledge so that automatic not made or distributed for profit or commercial advantage and that copies end, and it has the drawback of further increasing the runtime overhead. Current imple- bear this notice and the full citation on the first page. To copy otherwise, to itself been extensively reengineered during the time that it analyses can take this knowledge into account. A key feature of our approach republish, to post on servers or to redistribute to lists, requires prior specific has evolved. Initially Moose was little more than a com- mentations such as ODB [3], TOD [4] or Unstuck [5] can incur a slowdown of factor permission and/or a fee. Models 2008, Krzysztof Czarnecki, et al. (Eds.), LNCS, vol. 5301, Springer-Verlag, mon meta-model for integrating various ad hoc tools. As it 100 or more for non-trivial programs. Proceedings ESEC-FSE’05, pp. 1-10, ISBN 1-59593-014-0. September became apparent that these tools would benefit immensely 2008, pp. 660-674. 5–9, 2005, Lisbon, Portugal. Copyright 2005 ACM 1-59593-014-0/05/0009 ...$5.00. from a common infrastructure, we invested in the evolution
  • Scripting Visualizations with Mondrian Semantic Clustering: Identifying Topics in Test Blueprints — Exposing Side Effects in Source Code Execution Traces to Support Writing Unit Tests Michael Meyer and Tudor Gˆrba ı Software Composition Group, University of Berne, Switzerland Adrian Lienhard*, Tudor Gˆrba, Orla Greevy and Oscar Nierstrasz ı Adrian Kuhn a,1 St´phane Ducasse b,2 Tudor Gˆ a,1 e ırba Software Composition Group, University of Bern, Switzerland {lienhard, girba, greevy, oscar}@iam.unibe.ch Abstract 2 Mondrian by example a Software Composition Group, University of Berne, Switzerland b Language and Software Evolution Group, LISTIC, Universit´ de Savoie, France e Most visualization tools focus on a finite set of dedicated In this section we give a simple step-by-step example of Abstract Implementing a fixture and all the relevant assertions re- visualizations that are adjustable via a user-interface. In how to script a visualization using Mondrian. The example quired can be challenging if the code is the only source of this demo, we present Mondrian, a new visualization engine builds on a small model of a source code with 32 classes. Writing unit tests for legacy systems is a key maintenance information. One reason is that the gap between static struc- designed to minimize the time-to-solution. We achieve this The task we propose is to provide a on overview of the hi- task. When writing tests for object-oriented programs, ob- ture and runtime behavior is particularly large with object- by working directly on the underlying data, by making nest- erarchies. Abstract jects need to be set up and the expected effects of executing oriented programs. Side effects1 make program behavior ing an integral part of the model and by defining a powerful Creating a view and adding nodes. Suppose we can the unit under test need to be verified. If developers lack more difficult to predict. Often, encapsulation and complex scripting language that can be used to define visualizations. ask the model object for the classes. We can add those Many of the existing approaches in Software Comprehension focus on program pro- internal knowledge of a system, the task of writing tests is chains of method executions hide where side effects are pro- We support exploring data in an interactive way by provid- classes to a newly created view by creating a node for each gram structure or external documentation. However, by analyzing formal informa- non-trivial. To address this problem, we propose an ap- duced [2]. Developers usually resort to using debuggers to ing hooks for various events. Users can register actions for class, where each node is represented as a Rectangle. In the proach that exposes side effects detected in example runs of obtain detailed information about the side effects, but this tion the informal semantics contained in the vocabulary of source code are over- these events in the visualization script. case above, NOA, NOM and LOC are methods in the object the system and uses these side effects to guide the developer implies low level manual analysis that is tedious and time looked. To understand software as a whole, we need to enrich software analysis with representing a class and return the value of the correspond- when writing tests. We introduce a visualization called Test consuming [25]. ing metric. the developer knowledge hidden in the code naming. This paper proposes the use Blueprint, through which we identify what the required fix- Thus, the underlying research question of the work we of information retrieval to exploit linguistic information found in source code, such ture is and what assertions are needed to verify the correct present in this paper is: how can we support developers view := ViewRenderer new. 1 Introduction view newShape rectangle; as identifier names and comments. We introduce Semantic Clustering, a technique behavior of a unit under test. The dynamic analysis tech- faced with the task of writing unit tests for unfamiliar legacy width: #NOA; height: #NOM; linearColor: #LOC within: model classes; withBorder. based on Latent Semantic Indexing and clustering to group source artifacts that use nique that underlies our approach is based on both tracing code? The approach we propose is based on analyzing run- view nodes: model classes. similar vocabulary. We call these groups semantic clusters and we interpret them method executions and on tracking the flow of objects at time executions of a program. Parts of a program execu- view open. Visualization is an established tool to reason about data. as linguistic topics that reveal the intention of the code. We compare the topics runtime. To demonstrate the usefulness of our approach we tion, selected by the developer, serve as examples for new to each other, identify links between them, provide automatically retrieved labels, present results from two case studies. unit tests. Rather than manually stepping through the ex- Given a wanted visualization, we can typically find tools and use a visualization to illustrate how they are distributed over the system. Our ecution with a debugger, we perform dynamic analysis to that take as input a certain format and that provide the Keywords: Dynamic Analysis, Object Flow Analysis, Adding edges and layouting. To show how classes in- derive information to support the task of writing tests with- needed visualization [4]. herit from each other, we can add an edge for each inheri- approach is language independent as it works at the level of identifier names. To Software Maintenance, Unit Testing out requiring a detailed understanding of the source code. One drawback of the approach is that, when a deep rea- tance relationship. In our example, supposing that we can validate our approach we applied it on several case studies, two of which we present In our experimental tool, we present a visual represen- soning is required, we need to refer back to the capabili- ask the model for all the inheritance objects, and given an in this paper. 1 Introduction tation of the dynamic information in a diagram similar to ties of the original tool that manipulates the original data. inheritance object, we will create an edge between the node the UML object diagram [11]. We call this diagram a Test Note: Some of the visualizations presented make heavy use of colors. Please obtain Creating automated tests for legacy systems is a key Another drawback is that it actually duplicates the required holding the superclass and the node holding the subclass. maintenance task [9]. Tests are used to assess if legacy be- Blueprint as it serves as a plan for implementing a test. It resources unnecessarily: the data is present both in the orig- a color copy of the article for better understanding. We layout the nodes in a tree. havior has been preserved after performing modifications or reveals the minimal required fixture and the side effects that inal tool, and in the visualization tool. Several tools take a view := ViewRenderer new. extensions to the code. Unit testing (i.e., tests based on the are produced during the execution of a particular program middle ground approach and choose to work close with the view newShape rectangle; Key words: reverse engineering, clustering, latent semantic indexing, visualization XUnit frameworks [1]) is an established and widely used unit. Thus, the Test Blueprint reveals the exact information width: #NOA; height: #NOM; linearColor: #LOC within: model classes; data by either offering integration with other services [1], testing technique. It is now generally recognized as an es- that should be verified with a corresponding test. withBorder. PACS: or providing the services themselves [2]. However, when view nodes: model classes. sential phase in the software development life cycle to en- To generate a Test Blueprint, we need to accurately an- view edges: model inheritances another type of service is required, the integration is lost. from: #superclass sure software quality, as it can lead to early detection of alyze object usage, object reference transfers, and the side We present Mondrian, a visualization engine that imple- to: #subclass. defects, even if they are subtle and well hidden [2]. effects that are produced as a result of a program execution. view treeLayout. ments a radically different approach. Instead of provid- view open. The task of writing a unit test involves (i) choosing an To do so, we perform a dynamic Object Flow Analysis in ing a required data format, we provide a simple interface Email addresses: akuhn@iam.unibe.ch (Adrian Kuhn), sduca@unv-savoie.fr appropriate program unit, (ii) creating a fixture, (iii) execut- conjunction with conventional execution tracing [17]. through which the programmer can easily script the visu- Nesting. To obtain more details for the classes, we e (St´phane Ducasse), girba@iam.unibe.ch (Tudor Gˆ ırba). ing the unit under test within the context of the fixture, and Object Flow Analysis is a novel dynamic analysis which alization in a declarative fashion (more information can be would like to see which are the methods inside. To nest we 1 We gratefully acknowledge the financial support of the Swiss National Science (iv) verifying the expected behavior of the unit using asser- tracks the transfer of object references in a program execu- found in [3]). That is, our solution works directly with the specify for each node the view that goes inside. Supposing Foundation for the project “Recast: Evolution of Object-Oriented Applications” tions [1]. All these actions require detailed knowledge of tion. In previous work, we demonstrated how we success- objects in the data model, and instead of duplicating the ob- that we can ask each class in the model about its methods, (SNF 2000-061655.00/1) the system. Therefore, the task of writing unit tests may 1 We refer to side effect as the program state modifications produced by jects by model transformation, we transform the messages we can add those methods to the class by specifying the 2 We gratefully acknowledge the financial support of the french ANR for the project prove difficult as developers are often faced with unfamiliar a behavior. We consider the term program state to be limited to the scope papers sent to the original objects via meta-model transformations. view for each class. legacy systems. of the application under analysis (i.e., excluding socket or display updates). “Cook: R´architecturisation des applications ` objets” e a 1 Preprint submitted to Elsevier Science 11 October 2006 e more than shoul db Res earch The Story of Moose: an Agile Reengineering Environment Oscar Nierstrasz Stephane Ducasse ´ Tudor Gˆrba ı Practical Object-Oriented Back-in-Time Debugging Software Composition Group Software Composition Group Software Composition Group Enriching Reverse Engineering with University of Berne University of Berne University of Berne Switzerland Switzerland Switzerland Annotations Adrian Lienhard, Tudor Gˆrba and Oscar Nierstrasz ı www.iam.unibe.ch/∼scg Software Composition Group, University of Bern, Switzerland Andrea Br¨ hlmann, Tudor Gˆ u ırba, Orla Greevy, Oscar Nierstrasz ABSTRACT Software Composition Group, University of Bern, Switzerland http://scg.unibe.ch/ Abstract. Back-in-time debuggers are extremely useful tools for identifying the Moose is a language-independent environment for reverse- Requirements causes of bugs, as they allow us to inspect the past states of objects that are no and re-engineering complex software systems. Moose pro- longer present in the current execution stack. Unfortunately the “omniscient” ap- vides a set of services including a common meta-model, met- xxx problem assessment Xxx proaches that try to remember all previous states are impractical because they rics evaluation and visualization, a model repository, and a z generic GUI support for querying, browsing and grouping. Designs Abstract. Much of the knowledge about software systems is implicit, either consume too much space or they are far too slow. Several approaches rely yyy z and therefore difficult to recover by purely automated techniques. Archi- The development effort invested in Moose has paid off in Yyy on heuristics to limit these penalties, but they ultimately end up throwing out model capture and analysis tectural layers and the externally visible features of software systems are precisely those research activities that benefit from applying too much relevant information. In this paper we propose a practical approach a combination of complementary techniques. We describe two examples of information that can be difficult to detect from source to back-in-time debugging that attempts to keep track of only the relevant past Code migration how Moose has evolved over the years, we draw a number code alone, and that would benefit from additional human knowledge. data. In contrast to other approaches, we keep object history information together of lessons learned from our experience, and we outline the Typical approaches to reasoning about data involve encoding an explicit with the regular objects in the application memory. Although seemingly counter- present and future of Moose. meta-model and expressing analyses at that level. Due to its informal na- intuitive, this approach has the effect that past data that is not reachable from cur- Figure 1: The Reengineering life cycle. ture, however, human knowledge can be difficult to characterize up-front rent application objects (and hence, no longer relevant) is automatically garbage Categories and Subject Descriptors and integrate into such a meta-model. We propose a generic, annotation- collected. In this paper we describe the technical details of our approach, and D.2.7 [Software Engineering]: Maintenance—Restructur- based approach to capture such knowledge during the reverse engineering we present benchmarks that demonstrate that memory consumption stays within ing, reverse engineering, and reengineering process. Annotation types can be iteratively defined, refined and trans- practical bounds. Furthermore since our approach works at the virtual machine reengineer. In addition to the code base, there may be doc- formed, without requiring a fixed meta-model to be defined in advance. level, the performance penalty is significantly less than with other approaches. General Terms umentation (though often out of sync with the code), bug We show how our approach supports reverse engineering by implement- reports, tests and test data, database schemas, and espe- Measurement, Design, Experimentation cially the version history of the code base. Other important ing it in a tool called Metanool and by applying it to (i) analyzing archi- tectural layering, (ii) tracking reengineering tasks, (iii) detecting design 1 Introduction Keywords sources of information include the various stakeholders (i.e., flaws, and (iv) analyzing features. users, developers, maintainers, etc.), and the running system Reverse engineering, Reengineering, Metrics, Visualization itself. The reengineer will neither rely on a single source of When debugging object-oriented systems, the hardest task is to find the actual root information, nor on a single technique for extracting and cause of the failure as this can be far from where the bug actually manifests itself [1]. analyzing that information [11]. In a recent study, Liblit et al. examined bug symptoms for various programs and found 1. INTRODUCTION Reengineering is a complex task, and it usually involves 1 Introduction Software systems need to evolve continuously to be effec- that in 50% of the cases the execution stack contains essentially no information about several techniques. The more data we have at hand, the tive [41]. As systems evolve, their structure decays, unless more techniques we require to apply to understand this data. Most reverse engineering techniques focus on automatically extracting infor- the bug’s cause [2]. effort is undertaken to reengineer them [41, 44, 23, 11]. These techniques range from data mining, to data presen- mation from the source code without taking external human knowledge into Classical debuggers are not always up to the task, since they only provide access to The reengineering process comprises various activities, in- tation and to data manipulation. Different techniques are consideration. More often than not however, important external information is information that is still in the run-time stack. In particular, the information needed to cluding model capture and analysis (i.e., reverse engineer- implemented in different tools, by different people. An in- ing), assessment of problems to be repaired, and migration available (e.g., developer knowledge or domain specific knowledge) which would track down these difficult bugs includes (1) how an object reference got here, and (2) frastructure is needed for integrating all these tools. from the legacy software towards the reengineered system. greatly enhance analyses if it could be taken into account. the previous values of an object’s fields. For this reason it is helpful to have previous ob- Moose is a reengineering environment that offers a com- Although in practice this is an ongoing and iterative process, mon infrastructure for various reverse- and re-engineering Only few reverse engineering approaches integrate such external human knowl- ject states and object reference flow information at hand during debugging. Techniques we can idealize it (see Figure 1) as a transformation through tools [22]. At the core of Moose is a common meta-model edge into the analysis. For example, reflexion models have been proposed for ar- and tools like back-in-time debuggers, which allow one to inspect previous program various abstraction layers from legacy code towards a new for representing software systems in a language-independent chitecture recovery by capturing developer knowledge and then manually map- states and step backwards in the control flow, have gained increasing attention recently system [11, 13, 35]. way. Around this core are provided various services that What may not be clear from this very simplified picture is ping this knowledge to the source code [1,2]. Another example is provided by [3,4,5,6]. are available to the different tools. These services include that various kinds of documents are available to the software metrics evaluation and visualization, a repository for storing Intensional Views which make use of rules that encode external constraints and The ideal support for a back-in-time debugger is provided by an omniscient imple- multiple models, a meta-meta model for tailoring the Moose are checked against the actual source code [3]. mentation that remembers the complete object history, but such solutions are imprac- tical because they generate enormous amounts of information. Storing the data to disk Permission to make digital or hard copies of all or part of this work for meta-model, and a generic GUI for browsing, querying and In this paper we propose a generic framework based on annotations to en- personal or classroom use is granted without fee provided that copies are grouping. instead of keeping it in memory can alleviate the problem, but it only postpones the Moose has been developed over nearly ten years, and has hance a reverse engineered model with external knowledge so that automatic not made or distributed for profit or commercial advantage and that copies end, and it has the drawback of further increasing the runtime overhead. Current imple- bear this notice and the full citation on the first page. To copy otherwise, to itself been extensively reengineered during the time that it analyses can take this knowledge into account. A key feature of our approach republish, to post on servers or to redistribute to lists, requires prior specific has evolved. Initially Moose was little more than a com- mentations such as ODB [3], TOD [4] or Unstuck [5] can incur a slowdown of factor permission and/or a fee. Models 2008, Krzysztof Czarnecki, et al. (Eds.), LNCS, vol. 5301, Springer-Verlag, mon meta-model for integrating various ad hoc tools. As it 100 or more for non-trivial programs. Proceedings ESEC-FSE’05, pp. 1-10, ISBN 1-59593-014-0. September became apparent that these tools would benefit immensely 2008, pp. 660-674. 5–9, 2005, Lisbon, Portugal. Copyright 2005 ACM 1-59593-014-0/05/0009 ...$5.00. from a common infrastructure, we invested in the evolution
  • Scripting Visualizations with Mondrian Semantic Clustering: Identifying Topics in Test Blueprints — Exposing Side Effects in Source Code Execution Traces to Support Writing Unit Tests Michael Meyer and Tudor Gˆrba ı Software Composition Group, University of Berne, Switzerland Adrian Lienhard*, Tudor Gˆrba, Orla Greevy and Oscar Nierstrasz ı Adrian Kuhn a,1 St´phane Ducasse b,2 Tudor Gˆ a,1 e ırba Software Composition Group, University of Bern, Switzerland {lienhard, girba, greevy, oscar}@iam.unibe.ch Abstract 2 Mondrian by example a Software Composition Group, University of Berne, Switzerland b Language and Software Evolution Group, LISTIC, Universit´ de Savoie, France e Most visualization tools focus on a finite set of dedicated In this section we give a simple step-by-step example of Abstract Implementing a fixture and all the relevant assertions re- visualizations that are adjustable via a user-interface. In how to script a visualization using Mondrian. The example quired can be challenging if the code is the only source of this demo, we present Mondrian, a new visualization engine builds on a small model of a source code with 32 classes. Writing unit tests for legacy systems is a key maintenance information. One reason is that the gap between static struc- designed to minimize the time-to-solution. We achieve this The task we propose is to provide a on overview of the hi- task. When writing tests for object-oriented programs, ob- ture and runtime behavior is particularly large with object- by working directly on the underlying data, by making nest- erarchies. Abstract jects need to be set up and the expected effects of executing oriented programs. Side effects1 make program behavior ing an integral part of the model and by defining a powerful Creating a view and adding nodes. Suppose we can the unit under test need to be verified. If developers lack more difficult to predict. Often, encapsulation and complex scripting language that can be used to define visualizations. ask the model object for the classes. We can add those Many of the existing approaches in Software Comprehension focus on program pro- internal knowledge of a system, the task of writing tests is chains of method executions hide where side effects are pro- We support exploring data in an interactive way by provid- classes to a newly created view by creating a node for each gram structure or external documentation. However, by analyzing formal informa- non-trivial. To address this problem, we propose an ap- duced [2]. Developers usually resort to using debuggers to ing hooks for various events. Users can register actions for class, where each node is represented as a Rectangle. In the proach that exposes side effects detected in example runs of obtain detailed information about the side effects, but this tion the informal semantics contained in the vocabulary of source code are over- these events in the visualization script. case above, NOA, NOM and LOC are methods in the object the system and uses these side effects to guide the developer implies low level manual analysis that is tedious and time looked. To understand software as a whole, we need to enrich software analysis with representing a class and return the value of the correspond- when writing tests. We introduce a visualization called Test consuming [25]. ing metric. the developer knowledge hidden in the code naming. This paper proposes the use Blueprint, through which we identify what the required fix- Thus, the underlying research question of the work we of information retrieval to exploit linguistic information found in source code, such ture is and what assertions are needed to verify the correct present in this paper is: how can we support developers view := ViewRenderer new. 1 Introduction view newShape rectangle; as identifier names and comments. We introduce Semantic Clustering, a technique behavior of a unit under test. The dynamic analysis tech- faced with the task of writing unit tests for unfamiliar legacy width: #NOA; height: #NOM; linearColor: #LOC within: model classes; withBorder. based on Latent Semantic Indexing and clustering to group source artifacts that use nique that underlies our approach is based on both tracing code? The approach we propose is based on analyzing run- view nodes: model classes. similar vocabulary. We call these groups semantic clusters and we interpret them method executions and on tracking the flow of objects at time executions of a program. Parts of a program execu- view open. Visualization is an established tool to reason about data. as linguistic topics that reveal the intention of the code. We compare the topics runtime. To demonstrate the usefulness of our approach we tion, selected by the developer, serve as examples for new to each other, identify links between them, provide automatically retrieved labels, present results from two case studies. unit tests. Rather than manually stepping through the ex- Given a wanted visualization, we can typically find tools and use a visualization to illustrate how they are distributed over the system. Our ecution with a debugger, we perform dynamic analysis to that take as input a certain format and that provide the Keywords: Dynamic Analysis, Object Flow Analysis, Adding edges and layouting. To show how classes in- derive information to support the task of writing tests with- needed visualization [4]. herit from each other, we can add an edge for each inheri- approach is language independent as it works at the level of identifier names. To Software Maintenance, Unit Testing out requiring a detailed understanding of the source code. One drawback of the approach is that, when a deep rea- tance relationship. In our example, supposing that we can validate our approach we applied it on several case studies, two of which we present In our experimental tool, we present a visual represen- soning is required, we need to refer back to the capabili- ask the model for all the inheritance objects, and given an in this paper. 1 Introduction tation of the dynamic information in a diagram similar to ties of the original tool that manipulates the original data. inheritance object, we will create an edge between the node the UML object diagram [11]. We call this diagram a Test Note: Some of the visualizations presented make heavy use of colors. Please obtain can Creating automated tests for legacy systems is a key Another drawback is that it actually duplicates the required holding the superclass and the node holding the subclass. maintenance task [9]. Tests are used to assess if legacy be- Blueprint as it serves as a plan for implementing a test. It resources unnecessarily: the data is present both in the orig- a color copy of the article for better understanding. We layout the nodes in a tree. havior has been preserved after performing modifications or reveals the minimal required fixture and the side effects that inal tool, and in the visualization tool. Several tools take a view := ViewRenderer new. extensions to the code. Unit testing (i.e., tests based on the are produced during the execution of a particular program middle ground approach and choose to work close with the view newShape rectangle; Key words: reverse engineering, clustering, latent semantic indexing, visualization XUnit frameworks [1]) is an established and widely used unit. Thus, the Test Blueprint reveals the exact information width: #NOA; height: #NOM; linearColor: #LOC within: model classes; data by either offering integration with other services [1], testing technique. It is now generally recognized as an es- that should be verified with a corresponding test. withBorder. PACS: or providing the services themselves [2]. However, when view nodes: model classes. sential phase in the software development life cycle to en- To generate a Test Blueprint, we need to accurately an- view edges: model inheritances another type of service is required, the integration is lost. from: #superclass sure software quality, as it can lead to early detection of alyze object usage, object reference transfers, and the side We present Mondrian, a visualization engine that imple- to: #subclass. defects, even if they are subtle and well hidden [2]. effects that are produced as a result of a program execution. view treeLayout. ments a radically different approach. Instead of provid- view open. The task of writing a unit test involves (i) choosing an To do so, we perform a dynamic Object Flow Analysis in ing a required data format, we provide a simple interface Email addresses: akuhn@iam.unibe.ch (Adrian Kuhn), sduca@unv-savoie.fr appropriate program unit, (ii) creating a fixture, (iii) execut- conjunction with conventional execution tracing [17]. through which the programmer can easily script the visu- Nesting. To obtain more details for the classes, we e (St´phane Ducasse), girba@iam.unibe.ch (Tudor Gˆ ırba). ing the unit under test within the context of the fixture, and Object Flow Analysis is a novel dynamic analysis which alization in a declarative fashion (more information can be would like to see which are the methods inside. To nest we 1 We gratefully acknowledge the financial support of the Swiss National Science (iv) verifying the expected behavior of the unit using asser- tracks the transfer of object references in a program execu- found in [3]). That is, our solution works directly with the specify for each node the view that goes inside. Supposing Foundation for the project “Recast: Evolution of Object-Oriented Applications” tions [1]. All these actions require detailed knowledge of tion. In previous work, we demonstrated how we success- objects in the data model, and instead of duplicating the ob- that we can ask each class in the model about its methods, (SNF 2000-061655.00/1) the system. Therefore, the task of writing unit tests may 1 We refer to side effect as the program state modifications produced by jects by model transformation, we transform the messages we can add those methods to the class by specifying the 2 We gratefully acknowledge the financial support of the french ANR for the project prove difficult as developers are often faced with unfamiliar a behavior. We consider the term program state to be limited to the scope papers sent to the original objects via meta-model transformations. view for each class. legacy systems. of the application under analysis (i.e., excluding socket or display updates). “Cook: R´architecturisation des applications ` objets” e a 1 Preprint submitted to Elsevier Science 11 October 2006 e more than shoul db Res earch The Story of Moose: an Agile Reengineering Environment Oscar Nierstrasz Stephane Ducasse ´ Tudor Gˆrba ı Practical Object-Oriented Back-in-Time Debugging Software Composition Group Software Composition Group Software Composition Group Enriching Reverse Engineering with University of Berne University of Berne University of Berne Switzerland Switzerland Switzerland Annotations Adrian Lienhard, Tudor Gˆrba and Oscar Nierstrasz ı www.iam.unibe.ch/∼scg Software Composition Group, University of Bern, Switzerland Andrea Br¨ hlmann, Tudor Gˆ u ırba, Orla Greevy, Oscar Nierstrasz ABSTRACT Software Composition Group, University of Bern, Switzerland http://scg.unibe.ch/ Abstract. Back-in-time debuggers are extremely useful tools for identifying the Moose is a language-independent environment for reverse- Requirements causes of bugs, as they allow us to inspect the past states of objects that are no and re-engineering complex software systems. Moose pro- longer present in the current execution stack. Unfortunately the “omniscient” ap- vides a set of services including a common meta-model, met- xxx problem assessment Xxx proaches that try to remember all previous states are impractical because they rics evaluation and visualization, a model repository, and a z generic GUI support for querying, browsing and grouping. Designs Abstract. Much of the knowledge about software systems is implicit, either consume too much space or they are far too slow. Several approaches rely yyy z and therefore difficult to recover by purely automated techniques. Archi- The development effort invested in Moose has paid off in Yyy on heuristics to limit these penalties, but they ultimately end up throwing out model capture and analysis tectural layers and the externally visible features of software systems are precisely those research activities that benefit from applying too much relevant information. In this paper we propose a practical approach a combination of complementary techniques. We describe two examples of information that can be difficult to detect from source to back-in-time debugging that attempts to keep track of only the relevant past Code migration how Moose has evolved over the years, we draw a number code alone, and that would benefit from additional human knowledge. data. In contrast to other approaches, we keep object history information together of lessons learned from our experience, and we outline the Typical approaches to reasoning about data involve encoding an explicit with the regular objects in the application memory. Although seemingly counter- present and future of Moose. meta-model and expressing analyses at that level. Due to its informal na- intuitive, this approach has the effect that past data that is not reachable from cur- Figure 1: The Reengineering life cycle. ture, however, human knowledge can be difficult to characterize up-front rent application objects (and hence, no longer relevant) is automatically garbage Categories and Subject Descriptors and integrate into such a meta-model. We propose a generic, annotation- collected. In this paper we describe the technical details of our approach, and D.2.7 [Software Engineering]: Maintenance—Restructur- based approach to capture such knowledge during the reverse engineering we present benchmarks that demonstrate that memory consumption stays within ing, reverse engineering, and reengineering process. Annotation types can be iteratively defined, refined and trans- practical bounds. Furthermore since our approach works at the virtual machine reengineer. In addition to the code base, there may be doc- formed, without requiring a fixed meta-model to be defined in advance. level, the performance penalty is significantly less than with other approaches. General Terms umentation (though often out of sync with the code), bug We show how our approach supports reverse engineering by implement- reports, tests and test data, database schemas, and espe- Measurement, Design, Experimentation cially the version history of the code base. Other important ing it in a tool called Metanool and by applying it to (i) analyzing archi- tectural layering, (ii) tracking reengineering tasks, (iii) detecting design 1 Introduction Keywords sources of information include the various stakeholders (i.e., flaws, and (iv) analyzing features. users, developers, maintainers, etc.), and the running system Reverse engineering, Reengineering, Metrics, Visualization itself. The reengineer will neither rely on a single source of When debugging object-oriented systems, the hardest task is to find the actual root information, nor on a single technique for extracting and cause of the failure as this can be far from where the bug actually manifests itself [1]. analyzing that information [11]. In a recent study, Liblit et al. examined bug symptoms for various programs and found 1. INTRODUCTION Reengineering is a complex task, and it usually involves 1 Introduction Software systems need to evolve continuously to be effec- that in 50% of the cases the execution stack contains essentially no information about several techniques. The more data we have at hand, the tive [41]. As systems evolve, their structure decays, unless more techniques we require to apply to understand this data. Most reverse engineering techniques focus on automatically extracting infor- the bug’s cause [2]. effort is undertaken to reengineer them [41, 44, 23, 11]. These techniques range from data mining, to data presen- mation from the source code without taking external human knowledge into Classical debuggers are not always up to the task, since they only provide access to The reengineering process comprises various activities, in- tation and to data manipulation. Different techniques are consideration. More often than not however, important external information is information that is still in the run-time stack. In particular, the information needed to cluding model capture and analysis (i.e., reverse engineer- implemented in different tools, by different people. An in- ing), assessment of problems to be repaired, and migration available (e.g., developer knowledge or domain specific knowledge) which would track down these difficult bugs includes (1) how an object reference got here, and (2) frastructure is needed for integrating all these tools. from the legacy software towards the reengineered system. greatly enhance analyses if it could be taken into account. the previous values of an object’s fields. For this reason it is helpful to have previous ob- Moose is a reengineering environment that offers a com- Although in practice this is an ongoing and iterative process, mon infrastructure for various reverse- and re-engineering Only few reverse engineering approaches integrate such external human knowl- ject states and object reference flow information at hand during debugging. Techniques we can idealize it (see Figure 1) as a transformation through tools [22]. At the core of Moose is a common meta-model edge into the analysis. For example, reflexion models have been proposed for ar- and tools like back-in-time debuggers, which allow one to inspect previous program various abstraction layers from legacy code towards a new for representing software systems in a language-independent chitecture recovery by capturing developer knowledge and then manually map- states and step backwards in the control flow, have gained increasing attention recently system [11, 13, 35]. way. Around this core are provided various services that What may not be clear from this very simplified picture is ping this knowledge to the source code [1,2]. Another example is provided by [3,4,5,6]. are available to the different tools. These services include that various kinds of documents are available to the software metrics evaluation and visualization, a repository for storing Intensional Views which make use of rules that encode external constraints and The ideal support for a back-in-time debugger is provided by an omniscient imple- multiple models, a meta-meta model for tailoring the Moose are checked against the actual source code [3]. mentation that remembers the complete object history, but such solutions are imprac- tical because they generate enormous amounts of information. Storing the data to disk Permission to make digital or hard copies of all or part of this work for meta-model, and a generic GUI for browsing, querying and In this paper we propose a generic framework based on annotations to en- personal or classroom use is granted without fee provided that copies are grouping. instead of keeping it in memory can alleviate the problem, but it only postpones the Moose has been developed over nearly ten years, and has hance a reverse engineered model with external knowledge so that automatic not made or distributed for profit or commercial advantage and that copies end, and it has the drawback of further increasing the runtime overhead. Current imple- bear this notice and the full citation on the first page. To copy otherwise, to itself been extensively reengineered during the time that it analyses can take this knowledge into account. A key feature of our approach republish, to post on servers or to redistribute to lists, requires prior specific has evolved. Initially Moose was little more than a com- mentations such as ODB [3], TOD [4] or Unstuck [5] can incur a slowdown of factor permission and/or a fee. Models 2008, Krzysztof Czarnecki, et al. (Eds.), LNCS, vol. 5301, Springer-Verlag, mon meta-model for integrating various ad hoc tools. As it 100 or more for non-trivial programs. Proceedings ESEC-FSE’05, pp. 1-10, ISBN 1-59593-014-0. September became apparent that these tools would benefit immensely 2008, pp. 660-674. 5–9, 2005, Lisbon, Portugal. Copyright 2005 ACM 1-59593-014-0/05/0009 ...$5.00. from a common infrastructure, we invested in the evolution
  • ch is a puzzle R esear addFolder addPage
  • reevy e tal 2007 G
  • ch is a puzzle R esear
  • 008 Lanza 2 Wettel,
  • Rese can arch sh ould be open
  • Resea can rch sh ould im p act industry
  • is an analysis tool is a modeling platform is a visualization platform is a tool building platform is a collaboration is a ni dea
  • unib e.ch moo se.
  • Tudor Gîrba www.tudorgirba.com creativecommons.org/licenses/by/3.0/