Reverse Engineering: A Roadmap ¨ Hausi A. Muller Jens H. Jahnke Dennis B. Smith Dept. of Computer Science Dept. of Computer Science Software Engineering Institute University of Victoria, Canada University of Victoria, Canada Carnegie Mellon University, USA firstname.lastname@example.org email@example.com firstname.lastname@example.org Margaret-Anne Storey Scott R. Tilley Kenny Wong Dept. of Computer Science Dept. of Computer Science Dept. of Computing Science University of Victoria, Canada University of California, University of Alberta, Canada email@example.com Riverside, USA firstname.lastname@example.org email@example.comABSTRACT of capabilities to explore, manipulate, analyze, summarize,By the early 1990s the need for reengineering legacy systems hyperlink, synthesize, componentize, and visualize softwarewas already acute, but recently the demand has increased sig- artifacts. These capabilities include documentation in manyniﬁcantly with the shift toward web-based user interfaces. forms and intermediate representations for code, data, and ar-The demand by all business sectors to adapt their informa- chitecture. Many reverse engineering tools focus on extract-tion systems to the Web has created a tremendous need for ing the structure of a legacy system with the goal of transfer-methods, tools, and infrastructures to evolve and exploit ex- ring this information into the minds of the software engineersisting applications efﬁciently and cost-effectively. Reverse trying to reengineer or reuse it. In corporate settings, reverseengineering has been heralded as one of the most promising engineering tools still have a long way to go before becom-technologies to combat this legacy systems problem. ing an effective and integral part of the standard toolset that a typical software engineer uses day-to-day.This paper presents a roadmap for reverse engineering re-search for the ﬁrst decade of the new millennium, building The vitality of the ﬁeld has been demonstrated by three an-on the program comprehension theories of the 1980s and the nual conferences that helped to spark interest in the ﬁeld andreverse engineering technology of the 1990s. shape its ideas and focus: the Working Conference on Re- verse Engineering (WCRE), the International Workshop onKeywords Program Comprehension (IWPC), and the Workshop on Pro-Software engineering, reverse engineering, data reverse en- gram Analysis for Software Tools and Engineering (PASTE).gineering, program understanding, program comprehension,software analysis, software evolution, software maintenance, This paper presents a roadmap for reverse engineering re-software reengineering, software migration, software tools, search for the ﬁrst decade of the new millennium, buildingtool adoption, tool evaluation. on the program comprehension theories of the 1980s and the reverse engineering technology of the 1990s. We describe se-1 INTRODUCTION lected research agendas for code and data reverse engineer-The notion of computers automatically ﬁnding useful infor- ing, as well as research strategies for tool development andmation is an exciting and promising aspect of just about any evaluation. Investing in program understanding technologyapplication intended to be of practical use . A decade is critical for the software and information technology indus-ago, following up on the successes of the early CASE tools, try to control the inherent high costs and risks of legacy sys-Chikofsky and Cross introduced a taxonomy for reverse engi- tem evolution. Reverse engineering is a truly exciting ﬁeld ofneering and design recovery . They deﬁned reverse engi- research that is ready to be taught in computer science, com-neering to be “analyzing a subject system to identify its cur- puter engineering, and software engineering curricula .rent components and their dependencies, and to extract andcreate system abstractions and design information.” In summarizing the major research trends, accomplishments, and unanswered needs, this paper is divided into four ma-Over the past ten years, researchers have produced a number jor parts. Section 2 concentrates on code reverse engineer- ing, which has been the main focus of attention in this ﬁeld over the past decade. In contrast, data reverse engineering, the topic of Section 3, is not as well established, but is ex- pected to gain prominence in the new millennium. Section 4 explores the spectrum of reverse engineering tools. Section 5 deals with the question of why software reverse engineering tools are not more widely used, and Section 6 concludes the
paper. ful understanding and insight. The structural, functional, and behavioral code analyses , however, require intensive hu-2 CODE REVERSE ENGINEERING man input to construct from scratch. These analyses are dif-In current research and practice, the focus of both forward ﬁcult to interpret, and are costly efforts with high risk.and reverse engineering is at the code level. Forward engi-neering processes are geared toward producing quality code. Continuous Program UnderstandingThe importance of the code level is underscored in legacy To avoid a crisis, it is important to address information needssystems where important business rules are actually buried in more effectively throughout the software lifecycle. We needthe code . During the evolution of software, change is ap- to better support the forward and backward traceability ofplied to the source code, to add function, ﬁx defects, and en- software artifacts. For example, in the forward direction,hance quality. In systems with poor documentation, the code given a design module, it is important to be able to obtainis the only reliable source of information about the system. the code elements that implement it. In the backward direc-As a result, the process of reverse engineering has focused tion, given a source or object ﬁle, we need to be able to obtainon understanding the code. the business rule to which it contributes. In addition it is im- portant to determine when it is most appropriate to focus theOver the past ten years, reverse engineering research has pro- analysis at different levels of abstraction [7, 43].duced a number of capabilities for analyzing code, includingsubsystem decomposition [13, 86], concept synthesis , de- For understanding purposes, traceability is especially impor-sign, program and change pattern matching [16, 31, 59, 76], tant. We need to be able to take a pattern of change, suchprogram slicing and dicing , analysis of static and dy- as updating a tax law, and map this law explicitly into soft-namic dependencies , object-oriented metrics , and ware structures. Part of program comprehension is to recon-software exploration and visualization . In general, these struct mappings between the application and implementationanalyses have been successful at treating the software at the domains . Thus, to ease long-term understanding, thesesyntactic level to address speciﬁc information needs and to mappings must be made explicit, recorded, reused, and up-span relatively narrow information gaps. dated continuously. The vision is that reverse engineering would be applied incrementally, in small loops with forwardHowever, the code does not contain all the information that engineering, rather than as a desperate attempt at resurrectingis needed. Typically, knowledge about architecture and de- a poorly understood system.sign tradeoffs, engineering constraints, and the applicationdomain only exists in the minds of the software engineers . Several research issues, formulated as questions, need to beOver time, memories fade, people leave, documents decay, addressed to enable this capability for “continuous programand complexity increases . Consequently, an under- understanding” .standing gap arises between known, useful information andthe required information needed to enable software change. • What are the long-term information needs of a softwareAt some point, the gap may become too wide to be easily system?spanned by the syntactic, semantic, and dynamic analyses • What patterns of change do software systems undergo?provided by traditional programming tools. • What mappings need to be explicitly recorded? • What kind of software repository could represent the re-Thus when we focus only at the low levels of abstraction, we quired information?miss the big picture behind the evolution of a software sys- • What are the requirements of tool support to producetem . There is a need to focus future research on the more and manipulate the mappings?signiﬁcant levels of the business processes and the software • How can this support coexist with traditional, code-architecture. For example, knowledge of software architec- dominated tools, users, and processes?ture from multiple user perspectives is needed to make large-scale, structural changes , and the capability to perform Reverse Engineering Processarchitecture reconstruction is becoming increasingly impor- In addition to an emphasis on “continuous program under-tant . Developers need information about the impacts of standing,” it is important to focus efforts on a better deﬁnitionpotential changes. Managers need information to assign and of the reverse engineering process. Reverse engineering hascoordinate their personnel. If the information to create this typically been performed in an ad hoc manner. To address theknowledge can be maintained continuously, we could gener- technical issues effectively, the process must become moreate the required perspectives on a continuous basis without mature and repeatable, and more of its elements need to becostly reverse engineering efforts. supported by automated tools.Because such analyses are rarely performed today, current For example, a developer might require the software com-system evolution efforts often experience a time of crisis at ponents that contribute to a speciﬁc system responsibility.which the gap between desired information and available in- The subsystem view to present this information should notformation becomes critical. At that point reverse engineering require tedious manual manipulation. Instead, the mappingtechniques are inserted in a “big bang” attempt to regain use- between responsibility and components should be consulted
and a script would then generate the required view, with the the migration of information systems to the Web and towardsoption for minor, personal customization by the user. electronic commerce.Such a script is an instance of a reverse engineering pat- Researchers now recognize that the quality of a legacytern , a commonly used task or solution to produce un- system’s recovered data documentation can make or breakderstanding in a particular situation. By cataloging such pat- strategic information technology goals. For example dataterns and automating them through tool support, we would analysis is crucial in identifying the central business objectsimprove the maturity of the reverse engineering process. needed for migrating software systems to object-orientedThus, the insights of the SEI Capability Maturity Model R platforms. A negative example can be seen from the fact that(CMM R ) framework [36, 37] ought to apply to reverse en- difﬁculties in comprehending the data structure of legacy sys-gineering as well as forward engineering. Future research tems have been cited as barriers in replacing legacy softwareought to focus on ways to make the process of reverse engi- with modern business solutions (e.g., SAP, Baan, or PEO-neering more repeatable, deﬁned, managed, and optimized. PLESOFT ).Increased process maturity would enable better assessment The increased use of data warehouses and data mining tech-of the risks, costs, and economics of reengineering activities. niques for strategic decision support systems  have alsoWith poorly understood processes, the success of a reengi- motivated an interest in data reverse engineering technology.neering project rests solely on the ingenuity of the people Incorporating data from various legacy systems in data ware-involved—ingenuity that disappears when the project ends. houses requires a consistent mapping of legacy data struc-For evolving large software systems over long periods of tures on a common business object model. Similar chal-time, an appreciation of both product and process improve- lenges also occur with the web-based integration of formerlyment is needed. autonomous legacy information systems into cooperative, net-centric infrastructures.Research DirectionIn summary, for future research in reverse engineering, it is Data reverse engineering techniques can also be used to as-important to understand software at various levels of abstrac- sess the overall quality of software systems. An implementedtion and maintain mappings between these levels. Catalogs persistent data structure with signiﬁcant design ﬂaws indi-of information, tool, and process requirements are needed cates a poorly implemented software system. An analysisas a prelude to enabling continuous program understanding. of the data structures can help companies make decisionsUseful reverse engineering processes need to be identiﬁed on whether to purchase (and maintain) commercial-off-the-and better supported, as an important step to make the dis- shelf software packages. Data reverse engineering can alsocipline of reengineering more rational. Reverse engineering be used to assess the quality of the DBMS schema catalog oftools and processes need to evolve with the development en- vendor software, and thus it can represent one of the evalua-vironment that stresses components, the Web, and distributed tion criteria for a potential software product .systems . In general, reverse engineering the persistent data structure of3 DATA REVERSE ENGINEERING software systems using a DBMS is more speciﬁcally referredMost software systems for business and industry are informa- to as database reverse engineering. Since most DBMSs pro-tion systems, that is, they maintain and process vast amounts vide the functionality to extract initial information about theof persistent business data. While the main focus of code implemented physical data structure, database reverse engi-reverse engineering is on improving human understanding neering has a higher potential for automation than data re-about how this information is processed, data reverse engi- verse engineering . Consequently, most existing reverseneering tackles the question of what information is stored and engineering tools in this area consider information systemshow this information can be used in a different context. that employ a database platform. Many of these approaches are speciﬁcally targeted to relational systems [4, 26, 33, 40,Research in data reverse engineering has been under- 51, 64, 70].represented in the software reverse engineering arena fortwo main reasons. First, there is a traditional partition Data Reverse Engineering Process and the Role of Toolsbetween the database systems and software engineering Figure 1 shows that the data (base) reverse engineering pro-communities. Second, code reverse engineering appears at cess consists of two major activities, referred to as analysisﬁrst sight to be more challenging and interesting than data and abstraction, respectively.reverse engineering for academic researchers. Data AnalysisRecently, data reverse engineering concepts and techniques The analysis activity aims to recover an up-to-date logicalhave gained increasing attention in the reverse engineer- data model that is structurally complete and semantically an-ing arena. This has been driven by requirements for data- notated. In most cases, important information about the dataoriented mass software changes resulting from needs such model is missing in the physical schema catalog extractedas the Y2K problem, the European currency conversion, or from the DBMS. However, indicators for structural and se-
and idiosyncratic optimization patterns . Most ex- isting tools do not provide the necessary customizabil- ity to be applicable to this variety of application con- texts. Some approaches address this problem by provid- ing mechanisms for end-user programming with script- ing languages . In principle such tools provide a high amount of ﬂexibility. However, coding analysis operations and heuristics with scripting languages of- ten require signiﬁcant skills and experience. To ad- dress this problem, a number of dedicated, more ab- stract formalisms have been proposed to specify and customize reverse engineering processes [40, 70]. Due to their high level of abstraction these approaches facil- itate the customization process. However, they do not provide the same amount of ﬂexibility as scripting lan- guages. Consequently, a hybrid solution that combines high-level (e.g., rule-based) formalisms with low-level Figure 1: Data reverse engineering process (e.g., programming scripts)is a fruitful area for explo- ration.mantic schema constraints can be found in various parts of Conceptual Abstractionthe legacy information system, including its data, procedu- Conceptual abstraction aims to map the logical data modelral code, and documentation. Developers, users, and domain derived from data analysis to an equivalent conceptual de-experts can often contribute valuable knowledge. In general, sign. This design is usually represented by an entity-data analysis is an exploratory and human-intensive activity relationship or object-oriented model and provides the neces-that requires a signiﬁcant amount of experience and skills. sary level of abstraction required by most subsequent reengi-Current tools provide only minimal support in this activity neering activities (cf. Figure 1). Currently, several tools sup-beyond visualizing the structure of an extracted schema cat- port data abstraction. However, in practice, most of them arealog. of limited use because they fail to fulﬁll at least one of the following two requirements:Even though it is unlikely that the cognitive task of data anal-ysis can ever be fully automated, computer-aided reverse en-gineering tools have the potential to dramatically reduce the • Iteration. The data reverse engineering process in-effort spent in this phase. They could be a major aid in search- volves a sequence of analysis and abstraction activitiesing, collecting, and combining indicators for structural and with several cycles of iteration. After an initial analysissemantic schema constraints and guiding the reengineer from phase, the reengineer produces an initial abstract designan initially incomplete data model to a complete and consis- that serves as the basis for discussion with domain ex-tent result. However, to achieve this kind of support, current perts and further investigations. This ﬁrst abstract de-data reverse engineering tools need to overcome the follow- sign needs to be altered as new knowledge about theing two signiﬁcant problems: legacy system becomes available. Although iteration is not well supported by current tools, an incremental • Imperfect knowledge. Data analysis inherently deals change propagation mechanism is presented by Jahnke with uncertain assumptions and heuristics about legacy and Wadsack . data models . Combining detected semantic indi- cators (e.g., stereotypical code patterns or instances of • Bidirectional mapping process. Current data reverse hypothetical naming conventions in the schema catalog) engineering tools follow a strictly bottom-up data ab- often leads to uncertain and/or contradicting analysis re- straction process, that is, the abstraction is produced sults. Data reverse engineering tools have to tolerate through a transformation of the analyzed logical data imperfect knowledge to support this interactive process model. This approach is less adequate if a pre-existing and to incrementally guide the reengineer to a consistent partial design for the data structure is available from data model. documentation or the knowledge of domain experts or developers. Using such information efﬁciently in re- • Customizability. Legacy information systems are verse engineering legacy information systems would re- based on many different hardware and software plat- quire a hybrid bottom-up/top-down abstraction process. forms and programming languages. Their data models Furthermore, such a process is required when more than have been developed using various design conventions one legacy data structure has to be mapped to a common
abstract data model (e.g., when several information sys- toolset a typical software engineer calls upon in day-to-day tems are federated or integrated with a data warehouse). usage . Perhaps the biggest challenge to increased ef- fectiveness of reverse engineering tools is wider adoption: tools can’t be effective if they aren’t used, and most soft-Research Direction ware engineers have little knowledge of current tools andBased on this discussion, the reverse engineering community their capabilities. While there is a relatively healthy marketneeds to develop tools that provide more adequate support for for unit-testing tools, code debugging utilities, and integratedhuman reasoning in an incremental and evolutionary reverse development environments, the market for reverse engineer-engineering process that can be customized to different ap- ing tools remains quite limited.plication contexts. In addition to awareness, adoption represents a critical bar-4 REVERSE ENGINEERING TOOLS rier. Most people lack the necessary skills needed to makeTechniques used to aid program understanding can be proper use of reverse engineering tools. The root of the adop-grouped into three categories: unaided browsing, leveraging tion problem is really two-fold: a lack of software analysiscorporate knowledge and experience, and computer-aided skills on the part of today’s software engineers, and a lacktechniques like reverse engineering . of integration between advanced reverse engineering toolsUnaided browsing is essentially “humanware”: the software and more commonplace software utilities such as those men-engineer manually ﬂips through source code in printed form tioned above. The art of program understanding requiresor browses it online, perhaps using the ﬁle system as a nav- knowledge of program analysis techniques that are essen-igation aid. This approach has inherent limitations based on tially tool-independent. Since most programmers lack thisthe amount of information that a software engineer may be type of foundational knowledge, even the best of tools won’table to keep track of in his or her head. be of much help.Leveraging corporate knowledge and experience can be ac- From an integration perspective, most reverse engineeringcomplished through mentoring or by conducting informal tools attempt to create a completely integrated environ-interviews with personnel knowledgeable about the subject ment in which the reverse engineering tool assumes it hassystem. This approach can be very valuable if there are peo- overall control. However, such an approach precludes theple available who have been associated with the system as it easy integration of reverse engineering tools into toolsetshas evolved over time. They carry important information in commonly used in both academic research and in indus-their heads about design decisions, major changes over time, try. In a UNIX-like environment, the established troika ofand troublesome subsystems. edit/compile/debug tools are common . Representative tools in this group include emacs and vi for editing, gcc forFor example, corporate memory may be able to provide guid- compiling, and gdb for debugging. In a Windows NT envi-ance on where to look when carrying out a new maintenance ronment, the tools may have different names, but they serveactivity if it is similar to another change that took place in the similar purposes. The only real difference is cost and choice.past. This approach is useful both for gaining a big- picture A recent case study  illustrates the challenges facing stu-understanding of the system and for learning about selected dents in a short-term project and the difﬁculties they face insubsystems in detail. solving the problem. Learning how to effectively use a re-However, leveraging corporate knowledge and experience is verse engineering tool is low on their list of priorities, evennot always possible. The original designers may have left the when such a tool is available.company. The software system may have been acquired from In a corporate setting, the situation is not so very different.another company. Or the system may have had its mainte- A relatively short project often means little time to learn newnance out-sourced. In these situations, computer-aided re- tools. The tools used in a commercial software developmentverse engineering is necessary. A reverse-engineering en- ﬁrm may be slightly richer than those in the academic setting.vironment can manage the complexities of program under- However, displacing an existing tool with a new tool—evenstanding by helping the software engineer extract high-level if that tool is arguably better—is an extremely difﬁcult task.information from low-level artifacts, such as source code.This frees software engineers from tedious, manual, and What Can Be Doneerror-prone tasks such as code reading, searching, and pattern To address the challenges of reverse engineering tool effec-matching by inspection. tiveness, there are several possible avenues to explore. These candidate solutions should address the two primary issuesCurrent Tool Effectiveness identiﬁed above: awareness and adoption. First, computerGiven that reverse engineering tools seem to be a key to aid- science and software engineering curriculums can encourageing program understanding, how effective are today’s offer- greater use of reverse engineering tools. They can carefullyings in meeting this goal? In both academic and corporate balance code synthesis (which is commonly taught) with pro-settings, reverse engineering tools have a long way to go be- gram analysis (which is rarely taught). By learning the analy-fore becoming an effective and integral part of the standard
sis techniques used in the art of program understanding, stu- • user studies,dents would be in a better position to leverage the capabili- • ﬁeld observations,ties of reverse engineering tools that can automate many of • case studies, andthe analysis tasks. • surveys.To increase the adoption rate of reverse engineering tools,vendors need to address several issues. The tools need to be In general, there has been a lack of evaluation of reverse en-better integrated with common development environments gineering tools , but there are some examples where theon the popular platforms. They also need to be easier to investigative techniques listed above have been used for eval-use. A lengthy training period is a strong disincentive to tool uating tools. In this section, we describe these techniques andadoption. give examples of when these techniques have been applied to the evaluation of reverse engineering tools.An issue related to both integration and ease-of-use is “goodenough” or “just in time” understanding. If one watches how Expert reviewsa software engineer uses other tools, they rarely exercise all Expert reviews are a set of informal investigative techniquesof the tool’s functionality. Indeed, the 80/20 rule seems to ap- that are very effective for evaluating tools in the area ofply: 80% of the time they use less than 20% of the tool’s ca- human-computer interaction . One of these techniques,pabilities. If the critical capabilities that constitute the 20% heuristic evaluation, involves a set of expert reviewers cri-of commonly used functions were identiﬁed, vendors might tiquing the interface using a short list of design criteria .be better able to integrate at least this level of support into Cognitive walkthroughs, another expert review technique,other vendors’ environments. For example, the use of sim- involve experts simulating users walking through the inter-ple tools such as grep to look for patterns in source code is face to carry out typical tasks.inefﬁcient. These inefﬁciencies are the result of inexactness Expert reviews can be applied at any stage in the tool’s de-of regular expressions versus programming language syntax sign life cycle, and are normally not as expensive or as time-and semantics, as well as the large number of false positive consuming as more formal methods. For example, a reversematches. Yet grep is still widely used because of cost, avail- engineering tool developer could use the Technology Deltaability and ease of use. Perhaps simply augmenting grep with Framework developed by Brown and Wallnau  to do anmore context-dependent or domain-aware capabilities would introspective evaluation of their own tool in the early stagesbe a better approach than a full-ﬂedged search engine, with a of development. This framework supports technology eval-new pattern language, a proprietary repository, and tangential uation in two ways: understanding how the technology dif-capabilities. fers from other technologies and then considering how these5 EVALUATING REVERSE ENGINEERING TOOLS differences will support the users’ needs. This type of evalu-This paper includes many references to tools and techniques ation is very useful but is often overlooked for sophisticatedto support reverse engineering. But an important considera- research tools such as reverse engineering tools.tion when choosing a path through these technologies, is how User studiesto measure the success of the tools or theories that may be User studies are formal experiments where key factors (theselected. Many reverse engineering tools concentrate on ex- independent variables) are identiﬁed and manipulated totracting the structure or architecture of a legacy system with measure their effects on other factors (the dependent vari-the goal of transferring this information into the minds of the ables). Experiments can be conducted either in a laboratorysoftware engineers trying to maintain or reuse it. That is, the or in the ﬁeld. In a laboratory setting, there is more con-tool’s purpose is to increase the understanding that software trol over the independent variables in the experiment. How-engineers or/and managers have of the system being reverse ever, other factors are introduced which may not be applica-engineered. But, since there is no agreed-upon deﬁnition or ble in more realistic situations. For example, students are of-test of understanding , it is difﬁcult to claim that program ten used to act as subjects, but students probably do not com-comprehension has been improved when program compre- prehend programs in the same way that industrial program-hension itself cannot be measured. mers do . Fenton and Pﬂeeger refer to formal experi-Despite such difﬁculty, it is generally agreed that more ef- ments as research in the small . User studies are morefective tools could reduce the amount of time that maintain- appropriate for ﬁne-grained analyses of software engineeringers need to spend understanding software or that these tools activities or processes.could improve the quality of the programs that are being In general, there have been relatively few formal experimentsmaintained. Coarse-grained analyses of these types of results to evaluate reverse engineering tools. However there are acan be attempted. There are several investigative techniques few exceptions, most notably [12, 49, 78, 79].and empirical studies that may be appropriate for studying thebeneﬁts of reverse engineering tools . These include: Field observations Formal user studies in the ﬁeld can be more difﬁcult to exe- • expert reviews, cute than those in a laboratory setting, because they tend to
be more expensive and time consuming. However, informal The ﬁve tools they examined were: Rigi , the Dali work-user studies where one or two programmers are observed in bench , the Software Bookshelf , CIA , andtheir natural setting can be very insightful. Often a researcher SNiFF+ . Their investigations focused on the abstractionwill only have the opportunity to observe one or two pro- and visualization of system components and interactions.grammers. Although the observation may be intrusive on the Surveysprogrammers, this technique gives the researcher the oppor- Surveys are normally used as a retrospective investigativetunity to observe maintainers using tools in more realistic set- technique. For example, surveys can ask questions of the na-tings. However, the results from ﬁeld observations may also ture: Did the use of tool A reduce the amount of time you hadbe difﬁcult to generalize because of the small number of sub- to spend doing maintenance changes? Although infrequentlyjects normally involved. used in the ﬁeld of psychology of programming, surveys canVon Mayrhauser and Vans observed programmers in an in- be useful as a form of exploratory research .dustrial setting performing a variety of maintenance activi- Cross et al. designed a preference survey to informally eval-ties . The goal of their study was to validate their inte- uate the GRASP software visualization tool . GRASPgrated code comprehension model. They derived reverse en- uses a Control Structure Diagram (CSD), an algorithmic levelgineering tool capabilities from an analysis of audio-taped, graphical representation of the software. The CSD was com-think-aloud reports of the programmers’ information needs pared to four other graphical diagrams .during maintenance activities. Sim et al. conducted a survey using a web-based question-Singer and Lethbridge describe a ﬁeld experiment to study naire to ﬁnd archetypes (i.e., typical or standard examples)the work practices of software engineers working at a large of source code searching by maintainers . Their resultstelecommunications company . They combined various found that the most commonly used tools for searching wereinvestigative techniques to gather information on software (by increasing usage): editors, grep, ﬁnd, and integrated de-engineers’ work practices, such as questionnaires issued on velopment environments. Administering the questionnairethe Web, longitudinal observations of several software engi- over the Web was found to be very effective for informationneers, and company wide tool usage statistics. They used the gathering.results from their studies to motivate the design of a softwareexploration tool called TkSEE (Software Exploration Envi- Summaryronment) . This section reviewed various experimental techniques for evaluating and comparing software exploration tools, an im-Case studies portant category of reverse engineering tools. Each of the in-Case studies occur when a particular tool is applied to a spe- vestigative techniques just described has certain advantagesciﬁc system, and the experimenter, often introspectively, doc- and disadvantages. However, combining these techniquesuments the activities involved. Case studies are particularly (as Singer and Lethbridge have done ) should produceuseful when the experimenter has very little control over the stronger results. Moreover, sharing results among researchfactors to be studied. Expert reviews can be combined with groups is also very important. For example, Sim and Storeyspeciﬁc case studies as a more powerful evaluation tech- chaired a workshop where several reverse engineering toolsnique. were compared in a live demonstration . The tools wereBellay and Gall report an evaluation of four reverse engi- applied to a signiﬁcant case study where each team had toneering tools that analyze C source code : Reﬁne/C , complete a series of software maintenance and documenta-Imagix 4D , SNiFF+ , and Rigi . They inves- tion tasks and collaboration between teams was emphasized.tigated the capabilities of these tools by applying them to Adoption of reverse engineering technology in industry hasa real-world embedded software system which implements been very slow . However, we observed in our user stud-part of a train control system. They used a number of assess- ies [78, 79] that usability is often a major concern. If the toolment criteria derived from Brown and Wallnau’s Technology is difﬁcult to use, it will affect its adoption rate, no matter howDelta Framework . The main focus of their case study useful it may be.was on the tool capabilities to generate graphical reports suchas call trees, control-ﬂow graphs, and data-ﬂow graphs . 6 CONCLUSIONSThey concluded that there is no single tool that is the ’best’ The 1980s produced a solid foundation for our ﬁeld with theas the four tools differ considerably in their respective func- Laws of Software Evolution , theories for the fundamen-tionalities. tal strategies of program comprehension [14, 48, 60], and a taxonomy for reverse engineering . We also realized thatArmstrong and Trudeau also evaluated several reverse en- ﬁfty to ninety percent of evolution effort involves programgineering tools. They based their evaluation on the abili- understanding .ties of the tools to extract an architectural design from thesource code of CLIPS (C-Language Interface processing The 1990s began with a series of papers that outlined chal-System) and for browsing the Linux operating system . lenges and research directions for the decade [20, 35, 66, 67,
63, 88]. During that decade, the reverse engineering com- sues for the next decade. For the future, it is critical that wemunity developed infrastructures and tools for the three ma- can effectively answer questions, such as “How much knowl-jor components of a reverse engineering system: parsers, a edge, at what level of abstraction, do we need to extract fromrepository, and a visualization engine. Researchers devel- a subject system, to make informed decisions about reengi-oped strategies for speciﬁc reengineering scenarios [13, 30, neering it?” Thus, we need to tailor and adapt the program32, 45], and as a result investigated program understanding understanding tasks to speciﬁc reengineering objectives.technology for these scenarios using industrial-strength re- We will never be able to predict all needs of the reverse engi-verse engineering and transformation tools . neers and, therefore, must develop tools that are end-user pro-Even though the theory of parsing and its technology has grammable . Pervasive scripting is one successful strat-been around since the 1960s, robust parsers for legacy lan- egy to allow the user to codify, customize, and automate con-guages and their dialects are still not readily available . tinuous understanding activities and, at the same time, inte-A notable exception is the IBM VisualAge C++ environment, grate the reverse engineering tools into his or her personalwhich features an API to access the complete abstract syntax software development process and environment. Infrastruc-tree . Fortunately, the urgency of the Year 2000 problem tures for tool integration have evolved dramatically in recenthas made the availability of stand-alone parsers a top priority. years. We expect that control, data, and presentation integra-But there is more research needed to produce parsing compo- tion technology will continue to advance at amazing rates.nents that can be easily integrated with reverse engineering Finally, we need to evaluate reverse engineering tools andtools. technology in industrial settings with concrete reengineering tasks at hand.With the proliferation of object technology, the expectationswere high during the early 1990s for a common object- Even if we perfect reverse engineering technology, there areoriented repository to store all the artifacts being accumu- inherent high costs and risks in evolving legacy software sys-lated during the evolution of a software system. The research tems. Developing strategies to control these costs and riskscommunity made great strides in modelling collections of is a key research direction for the future. Practitioners need asoftware artifacts at various levels of abstraction using graphs reengineering economics book, which would serve as a guideand developing object-oriented schemas for these models, to determine reengineering costs and to use economic analy-but in most cases the artifacts for multi million-line software ses for making improved reengineering decisions.systems were stored in relational databases and ﬁle systems. Probably the most critical issue for the next decade is to teachThe past decade produced many software exploration students about software evolution. Computer science, com-tools [12, 18, 23, 29, 42, 52, 53, 54, 61, 65, 73, 77]. We puter engineering, and software engineering curricula, by andﬁnally have enough desktop computing power to manipulate large, teach software construction from scratch and neglect tohuge graphs of software artifacts effectively. Some software teach software maintenance and evolution. Contrast this sit-exploration tools are now built using web browsers to uation with electrical or civil engineering, where the study ofexploit the fact that the users intimately know these tools for existing systems and architectures constitutes a major part ofexploring dependencies . the curriculum. Concepts such as architecture, abstraction, consistency, completeness, efﬁciency, or robustness shouldThis paper presented four perspectives on the ﬁeld of reverse be taught from both a software design and a software analy-engineering to provide a roadmap for the ﬁrst decade of the sis perspective. Software architecture courses are now estab-new millennium. Researchers will continue to develop tech- lished in many computer science programs, but topics suchnology and tools for generic reverse engineering tasks, partic- as software evolution, reverse engineering, program under-ularly for data reverse engineering (e.g., the recovery of logi- standing, software reengineering, or software migration arecal and conceptual schemas), but future research ought to fo- rare. We must aim for a balance between software analysiscus on ways to make the process of reverse engineering more and software construction in software engineering curricula.repeatable, deﬁned, managed, and optimized . We needto integrate forward and reverse engineering processes for ACKNOWLEDGEMENTSlarge evolving software systems and achieve the same appre- This research was supported in part by NSERC, the Nationalciation for product and process improvement for long-term Sciences and Engineering Research Council of Canada, byevolution as for the initial development phases . CAS, the IBM Toronto Centre for Advanced Studies, by CSER, the Canadian Consortium for Software EngineeringThe most promising direction in this area is the continuous Research, by IRIS, the Institute for Robotics and Intelli-program understanding approach . The premise that soft- gent Systems Network of Centres for Excellence, by ASI,ware reverse engineering needs to be applied continuously the British Columbia Advanced Systems Institute, by thethroughout the lifetime of the software and that it is important Carnegie Mellon Software Engineering Institute, and theto understand and potentially reconstruct the earliest design Universities of Alberta, Paderborn, Riverside, and Victoria.and architectural decisions  has major tool design impli-cations. Tool integration and adoption should be central is- REFERENCES
 P. Aiken. Data Reverse Engineering: Slaying the  R. Brooks. Towards a theory of comprehension of com- Legacy Dragon. McGraw-Hill, 1995. puter programs. International Journal of Man-Machine Studies, 18:86–98, 1983.  M. Armstrong and C. Trudeau. Evaluating architec- tural extractors. In Proceedings of the 5th Working Con-  A. Brown and K. Wallnau. A framework for evaluat- ference on Reverse Engineering (WCRE-98), Honolulu, ing software technology. IEEE Software, pages 39–49, Hawaii, USA, pages 30–39, October 1998. September 1996.  L. Bass, P. Clements, and R. Kazman. Software Archi-  B. Brown, X. Malveau, X. M. III, and T. Mowbray. tecture in Practice. Addison-Wesley, 1997. AntiPatterns: Refactoring Software, Architectures, and  A. Behm, A. Geppert, and K. R. Dittrich. On the Projects in Crisis. John Wiley & Sons, 1998. migration of relational schemas and data to object-  E. Buss, R. DeMori, W. Gentleman, J. Henshaw, oriented database systems. In Proceedings 5th In- H. Johnson, K. Kontogiannis, E. Merlo, H. M¨ ller, u ternational Conference on Re-Technologies for In- J. Mylopoulos, S. Paul, A. Prakash, M. Stanley, S. R. formation Systems, Klagenfurt, Austria, pages 13– ¨ Tilley, J. Troster, and K. Wong. Investigating reverse 33. Osterreichische Computer Gesellschaft, December engineering technologies for the cas program under- 1997. standing project. IBM Systems Journal, 33(3):477–500,  B. Bellay and H. Gall. An evaluation of reverse engi- August 1994. neering tool capabilities. Journal of Software Mainte-  Y.-F. Chen, M. Nishimoto, and C. Ramamoorthy. The nance: Research and Practice, 10:305–331, 1998. C information abstraction system. IEEE Transactions  K. Bennett and V. Rajlich. Software maintenance and on Software Engineering, 16(1):325–334, March 1990. evolution: A roadmap. In this volume, June 2000.  S. Chidamber and C. Kemerer. A metrics suite for  J. Bergey, D. Smith, N. Weiderman, and S. Woods. Op- object-oriented design. IEEE Transactions Software tions analysis for reengineering (OAR): Issues and con- Engineering, 20(6):476–493, 1994. ceptual approach. Technical Report CMU/SEI-99-TN- 014, Carnegie Mellon Software Engineering Institute,  E. Chikofsky and J. Cross. Reverse engineering and de- 1999. sign recovery: A taxonomy. IEEE Software, 7(1):13– 17, January 1990.  T. Biggerstaff, B. Mitbander, and D. Webster. Pro- gram understanding and the concept assignment prob-  R. Clayton, S. Rugaber, and L. Wills. On the knowl- lem. Communications of the ACM, 37(5):72–83, May edge required to understand a program. In Proceedings 1994. of the 5th Working Conference on Reverse Engineer- ing (WCRE-98), Honolulu, Hawaii, USA, pages 69–78,  A. Blackwell. Questionable practices: The use of ques- October 1998. tionnaire in psychology of programming research. The Psychology of Programming Interest Group Newsletter,  A. Clewett, D. Franklin, and A. McCown. Network Re- 22, July 1998. source Planning For SAP R/3, BAAN IV, and PEOPLE- SOFT: A Guide to Planning Enterprise Applications. M. Blaha. On reverse engineering of vendor databases. McGraw-Hill, 1998. In Working Conference on Reverse Engineering (WCRE-98), Honolulu, Hawaii, USA, pages 183–190.  M. Consens, A. Mendelzon, and A. Ryman. Visualiz- IEEE Computer Society Press, October 1998. ing and querying software structures. In Proceedings M. Blaha and W. Premerlani. Observed idiosyncracies of the 14th International Conference on Software Engi- of relational database designs. In Second Working Con- neering (ICSE), Melbourne, Australia, pages 138–156. ference on Reverse Engineering (WCRE-95), Toronto, IEEE Computer Society Press, 1992. Ontario, Canada. IEEE Computer Society Press, 1995.  J. Cross II, T. Hendrix, L. Barowski, and K. Mathias. K. Brade, M. Guzdial, M. Steckel, and E. Soloway. Scalable visualizations to support reverse engineering: Whorf: A visualization tool for software maintenance. A framework for evaluation. In Proceedings of the 5th In Proceedings 1992 IEEE Workshop on Visual Lan- Working Conference on Reverse Engineering (WCRE- guages, Seattle, Washington, pages 148–154, Septem- 98), Honolulu, Hawaii, USA, pages 201–209, October ber 1992. 1998. M. Brodie and M. Stonebraker. Migrating Legacy Sys-  J. Cross II, S. Maghsoodloo, and T. Hendrix. The con- tems: Gateways, Interfaces, and the Incremental Ap- trol structure diagram: An initial evaluation. Empirical proach. Morgan Kauffman, 1995. Software Engineering, 3(2):131–156, 1998.
 C. Fahrner and G. Vossen. Transforming relational  J. H. Jahnke and J. Wadsack. Integration of analysis database schemas into object-oriented schemas accord- and redesign activities in information system reengi- ing to ODMG-93. In Proceedings of the 4th Interna- neering. In Proceedings of the 3rd European Con- tional Conference on Deductive and Object-Oriented ference on Software Maintenance and Reengineering Databases, 1995. (CSMR-99), Amsterdam, The Netherlands, pages 160– 168. IEEE CS, March 1999. N. Fenton and S. L. Pﬂeeger. Software Metrics: A Rig- orous and Practical Approach. PWS Publishing Com-  R. Kazman and S. Carrie‘re. Playing detective: Recon- pany, 1997. structing software architecture from available evidence. Journal of Automated Software Engineering, 6(2):107– P. Finnigan, R. Holt, I. Kalas, S. Kerr, K. Kontogiannis, 138, April 1999. H. M¨ ller, J. Mylopoulos, S. Perelgut, M. Stanley, and u K. Wong. The software bookshelf. IBM Systems Jour-  R. Kazman, S. Woods, and S. Carri` re. Requirements e nal, 36(4):564–593, 1997. for integrating software architecture and reengineering models: CORUM II. In Proceedings of the Fifth Work- P. Finnigan, R. Holt, I. Kalas, S. Kerr, K. Kontogiannis, ing Conference on Reverse Engineering (WCRE-98), H. M¨ ller, J. Mylopoulos, S. Perelgut, M. Stanley, and u Honolulu, Hawaii, USA, pages 154–163. IEEE Com- K. Wong. The software bookshelf. IBM Systems Jour- puter Society Press, October 1998. nal, 36(4):564–593, November 1997.  U. K¨ lsch. Methodische Integration und Migration o M. Fowler. Refactoring: Improving the Design of Ex- von Informationssystemen in objektorientierte Umge- isting Code. Addison-Wesley, 1999. bungen. PhD thesis, Forschungszentrum Informatik, Universit¨ t Karlsruhe, Germany, December 1999. a E. Gamma, R. Helm, R. Johnson, and J. Vlissides. De- sign Patterns: Elements of Reusable Object-Oriented  K. Kontogiannis, J. Martin, K. Wong, R. Gregory, Software. Addison-Wesley, 1995. H. M¨ ller, and J. Mylopoulos. Code migration through u transformations: An experience report. In Proceedings I. Graham. Migrating to Object Technology. Addison- of CASCON-98, Toronto Ontario, Canada, November Wesley, 1994. 1998. J.-L. Hainaut, J. Henrard, J.-M. Hick, and D. Roland.  M. Lehman. Programs, life cycles and laws of software Database design recovery. Lecture Notes in Computer evolution. Proceedings of IEEE Special Issue on Soft- Science, 1080:272ff, 1996. ware Engineering, 68(9):1060–1076, September 1980. W. Harrison, H. Ossher, and P. Tarr. Software engineer-  T. Lethbridge and J. Singer. Understanding software ing tools and environments: A roadmap. In this volume, maintenance tools: Some empirical research. In IEEE June 2000. Workshop on Empirical Studies of Software Mainte- P. Hausler, M. Pleszkoch, R. Linger, and A. Hevner. Us- nance (WESS-97), Bari, Italy, pages 157–162, October ing function abstraction to understand program behav- 1997. ior. IEEE Software, 7(1):55–63, January 1990.  S. Letovsky. Cognitive Processes in Program Compre- W. S. Humphrey. Managing the Software Process. hension, pages 58–79. Ablex Publishing Corporation, Addison-Wesley, 1989. 1986. W. S. Humphrey. A Discipline for Software Engineer-  P. Linos, P. Aubet, L. Dumas, Y. Helleboid, P. Lejeune, ing. Addison-Wesley, 1995. and P. Tulula. Visualizing program dependencies: An experimental study. Software–Practice and Experience, Imagix 4D. Imagix Corp. http://www.imagix.com. 24(4):387–403, April 1994. J. H. Jahnke. Management of Uncertainty and Inconsis-  J. Martin. Leveraging ibm visualage c++ for reverse tency in Database Reengineering Processes. PhD the- engineering tasks. In Proceedings of CASCON-99, sis, Department of Mathematics and Computer Science, Toronto, Ontario, Canada, November 1999. Universit¨ t Paderborn, Germany, September 1999. a  P. Martin, J. R. Cordy, and R. Abu-Hamdeh. Infor- J. H. Jahnke, W. Sch¨ fer, and A. Z¨ ndorf. Generic a u mation capacity preserving of relational schemas us- fuzzy reasoning nets as a basis for reverse engineering ing structural transformation. Technical Report ISSN relational database applications. In Proceedings of Eu- 0836-0227-95-392, Department of Computing and In- ropean Software Engineering Conference (ESEC/FSE), formation Science, Queen’s University, Kingston, On- number 1302 in LNCS. Springer, September 1997. tario, Canada, November 1995.
 A. Mendelzon and J. Sametinger. Reverse engineering  B. A. Price, R. M. Baecker, and I. S. Small. A principled by visualizing and querying. Software Concepts and taxonomy of software visualization. Journal of Visual Tools, 16:170–182, 1995. Languages and Computing, 4(3):211–266, 1993. H. M¨ ller and K. Klashinsky. Rigi—A system for u  C. Rich and L. Wills. Recognizing a program’s design: programming-in-the-large. In Proceedings of the A graph-parsing approach. IEEE Software, 7(1):82–89, 10th International Conference on Software Engineer- January 1990. ing (ICSE), Rafﬂes City, Singapore, pages 80–86. IEEE Computer Society Press, April 1988.  S. Rugaber and S. Ornburn. Recognizing design deci- sions in programs. IEEE Software, 7(1):46–54, January H. M¨ ller, S. Tilley, M. O. B. Corrie, and N. Mad- u 1990. havji. A reverse engineering environment based on spatial and visual software interconnection models. In  M. Shaw. Software engineering education: A roadmap. Proceedings of the Fifth ACM SIGSOFT Symposium on In this volume, June 2000. Software Development Environments (SIGSOFT-92),  B. Shneiderman. Designing the User Interface: Tyson’s Corner, Virginia, USA, In ACM Software En- Strategies for Effective Human-Computer Interaction. gineering Notes, volume 17, pages 88–98, December Addison-Wesley, 1998. Third Edition. 1992.  O. Signore, M. Loffredo, M. Gregori, and M. Cima. Re- T. Munakata. Knowledge discovery. Communications construction of er schema from database applications: of the ACM, 42(11):26–29, November 1999. a cognitive approach. In Proceedings of 13th Interna- G. Murphy, D. Notkin, and S. Lan. An empirical study tional Conference of ERA, Manchester, UK, pages 387– of static call graph extractors. In Proceedings of the 402. Springer, 1994. 18th International Conference on Software Engineer-  S. Sim, C. Clarke, and R. Holt. Archetypal source ing, Berlin, Germany, pages 90–100. IEEE Computer code searches: A survey of software developers and Society Press, March 1996. maintainers. In Proceedings of the 5th Working Con- J. Nielsen. Usability Engineering. Academic Press, ference on Reverse Engineering (WCRE-98), Honolulu, New York, 1994. Hawaii, USA, pages 180–187, October 1998. J. Ning. A Knowledge-based Approach to Auto-  S. Sim and M.-A. D. Storey. A collective matic Program Analysis. PhD thesis, Department of demonstration of program comprehension tools, Computer Science, University of Illinois at Urbana- a CASCON-99 workshop, November 1999. Champaign, 1989. http://www.csr.uvic.ca/cascon99/. S. Paul and a. Prakash. On formal query languages for  J. Singer and T. Lethbridge. Studying work practices source code search. IEEE Transactions on Software En- to assist tool design in software engineering. In Pro- gineering, SE-20(6):463–475, June 1994. ceedings of the 6th International Workshop on Program Comprehension (WPC-98), Ischia, Italy, pages 173– N. Pennington. Stimulus structures and mental repre- 179, June 1998. sentations in expert comprehension of computer pro- grams. Cognitive Psychology, 19:295–341, 1987.  SNiFF+. User’s Guide and Reference, Take- Five Software, version 2.3, December 1996. P. Penny. The Software Landscape: A Visual Formal- http://www.takeﬁve.com. ism for Programming-in-the-Large. PhD thesis, De- partment of Computer Science, University of Toronto,  T. Standish. An essay on software reuse. IEEE Trans- 1992. actions on Software Engineering, SE-10(5):494–497, September 1984. D. Perry, A. Porter, and J. L. Votta. Empirical studies: A roadmap. In this volume, June 2000.  P. Stevens and R. Pooley. Systems reengineering pat- terns. In ACM SIGSOFT Foundations of Software En- R. C. W. Peter G. Selfridge and E. J. Chikofsky. Chal- gineering (FSE-98), Lake Buena Vista, Florida, USA, lenges to the ﬁeld of reverse engineering. In Working pages 17–23. ACM Press, 1998. Conference on Reverse Engineering (WCRE-93), Bal- timore, Maryland, USA, pages 144–150, 1993.  M.-A. Storey and H. M¨ ller. Manipulating and doc- u umenting software structure using shrimp views. In W. J. Premerlani and M. R. Blaha. An approach for re- Proceedings of the International Conference on Soft- verse engineering of relational databases. Communica- ware Maintenance (ICSM), Opio, France, pages 275– tions of the ACM, 37(5):42–49, May 1994. 284. IEEE Computer Society Press, October 1998.
 M.-A. Storey, K. Wong, P. Fong, D. Hooper, K. Hop-  K. Wong. Reverse Engineering Notebook. PhD thesis, kins, and H. M¨ ller. On designing an experiment to u Department of Computer Science, University of Victo- evaluate a reverse engineering tool. In Proceedings ria, October 1999. of the 3rd Working Conference on Reverse Engineering (WCRE-96), Monterey, California, USA, pages 31–40,  K. Wong, S. Tilley, H. M¨ ller, and M.-A. Storey. Struc- u November 1996. tural redocumentation. IEEE Software, 12(1):46–54, January 1995. M.-A. Storey, K. Wong, and H. M¨ ller. How do pro- u gram understanding tools affect how programmers un- derstand programs. In Proceedings of the 4th Working Conference on Reverse Engineering (WCRE-97), Ams- terdam, The Netherlands, pages 12–21, October 1997. T. Syst¨ . On the relationships between static and dy- a namic models in reverse engineering java software. In Proceedings of the Sixth Working Conference on Re- verse Engineering (WCRE-99), Atlanta, Georgia, USA, pages 304–313. IEEE Computer Society Press, October 1999. S. Tilley, K. Wong, M.-A. Storey, and H. M¨ ller. Pro- u grammable reverse engineering. International Journal of Software Engineering and Knowledge Engineering, 4(4):501–520, December 1994. S. R. Tilley. Coming attractions in program understand- ing II: Highlights of 1997 and opportunities for 1998. Technical Report CMU/SEI-98-TR-001, Carnegie Mel- lon Software Engineering Institute, February 1998. S. R. Tilley. The Canonical Activities of Reverse Engi- neering. Baltzer Science Publishers, The Netherlands, February 2000. S. R. Tilley and S. Huang. Just enough understanding and not enough time. Technical report, Department of Computer Sciene, University of California Riverside, December 1999. J. Troster, J. Henshaw, and E. Buss. Filtering for quality. In the Proceedings of CASCON-93, Toronto, Ontario, Canada, pages 429–449, October 1993. A. Umar. Application (Re)Engineering: Building Web- Based Applications and Dealing with Legacies. Pren- tice Hall, 1997. A. von Mayrhauser and A. Vans. From code under- standing needs to reverse engineering tool capabilities. In Proceedings of CASE-93, Singapore, pages 230–239, July 1993. R. C. Waters and E. J. Chikofsky. Reverse engineering—Introduction to the special section. Communications of the ACM, 37(5):22–25, May 1994. M. Weiser. Program slicing. IEEE Transactions on Soft- ware Engineering, SE-10(4):352–357, July 1984.