Aiding Program Comprehension by Static and Dynamic
                 Feature Analysis

     Thomas Eisenbarth1 , Rainer Koschke2 , Daniel Simon3

             1 Axivion   GmbH   2 Universit¨t
                                           a    Bremen   3 SQS



                         ICSM 2011
       Presentation of Most-Influential Paper ICSM 2001
This paper was joint work with my two colleagues.
These are the three authors at the time of the publication, ten years ago.
Left you have Thomas Eisenbarth and at the right you see Daniel Simon.
Unfortunately, they cannot be here. They want me to send their best
regards. They are – like me – very honored by this award.
Here are two more current photographs of them.
They have not changed much. That is no surprise since their main
expertise is in maintenance.
I remember ICSM 2001 very well. It was in a great location. In Florence.
Florence has so many attractions.
Florence is full of so many attractions and beauty.
It was a real surprise that someone showed up at my talk at Florence.
Before I tell you more about the content of the paper, I would like to tell
you a bit about the history of the paper itself, that is, its development
process.
The initial trigger for the idea of our paper was the call for paper of a
                                                                                                                     German software product line workshop.
              *&4&
'SBVOIPGFS *OTUJUVU                            Call for Papers
           &YQFSJNFOUFMMFT
           4PGUXBSF &OHJOFFSJOH
                                1. Deutscher
                      Software-Produktlinien Workshop
                               Kaiserslautern, 10. November 2000
  Hintergrund                                               Themengebiete
                                                            Beiträge, vor allem, aber nicht ausschließlich zu den
  Die Entwicklung ähnlicher Produkte als Produktlinie
                                                            folgenden Themen, sind willkommen:
  – oder Produktfamilie – bietet gegenüber der relativ
                                                            • Planung von Produktlinien
  teuren Einzelsystementwicklung viele Vorteile, die
                                                            • Requirements Engineering für Produktlinien
  überwiegend darauf beruhen, daß alle Familienmit-
                                                            • Modellierung von Produktlinien
  glieder auf einer gemeinsamen Infrastruktur – auch
                                                            • Verfolgbarkeit von Anforderungen
  Plattform oder Architektur genannt – aufbauen. Wäh-
                                                            • Konfigurationsmanagement für Produktlinien
  rend in anderen Industriebranchen, wie z.B. dem
                                                            • Definition von Softwarearchitekturen
  Automobilbau oder der Unterhaltungsindustrie, die
                                                            • Recovery von Softwarearchitekturen
  Vorteile der Produktlinienentwicklung längst systema-
                                                            • Referenzarchitekturen für Produktlinien
  tisch genutzt werden, werden die meisten Softwaresy-
                                                            • Weiterentwicklung von Architekturen
  steme nach wie vor als teure Einzelstücke gefertigt.
                                                            • Komponententechnologie für Produktlinien
  Dabei kann speziell die Softwareentwicklung von           • Reengineering im Hinblick auf Produktlinien
  Produktlinien profitieren: zum Beispiel durch Zeit-       • Industrielle Erfahrungen mit Produktlinien
  und Kostenersparnis bei der Entwicklung neuer, ähnli-     • Produktlinien für KMUs
  cher Produkte oder durch höhere Produktqualität auf-      • Einführung von Produktlinienansätzen
  grund eines hohen Wiederverwendundgsanteils               Beiträge sind in elektronischer Form (PDF oder
  existierender und bereits erprobter Komponenten.          PostScript) an knauber@iese.fhg.de einzureichen; der
  Auch das Anpassen von Standardprodukten an beson-         Umfang der Beiträge sollte fünf Seiten nicht über-
  dere Kundenwünsche wird durch vorab geplante              schreiten. Weitere Informationen sind unter
  Variabilität erleichtert. Produktlinien decken naturge-   http://www.iese.fhg.de/dspl-workshop
  mäß den gesamten Softwarelebenszyklus ab, daher           verfügbar.
  integrieren sie viele andere Themenbereiche wie           Termine:
  Requirements Engineering, Softwarearchitekturen           Einsendung von Beiträgen:                    31.8.2000
  und Reengineering.                                        Benachrichtigung über die Annahme:           1.10.2000
  Nach etwa einem Jahrzehnt der Forschung erfahren          Einsendung der endgültigen Version:         20.10.2000
  Produktlinien für Softwaresysteme immer mehr Auf-         Versand des endgültigen Programms:          25.10.2000
  merksamkeit, was sich in der zunehmenden Anzahl           Programmkommitee:
  internationaler Veranstaltungen zu diesem Themen-         • Dr. P. Knauber (Fraunhofer IESE)
  kreis niederschlägt. Auch in Deutschland stoßen Pro-      • Prof. Dr. K. Pohl (Universität Essen)
  duktlinien und benachbarte Themengebiete auf immer
  mehr Interesse, was sich unter anderem an der Beteili-    •   Prof. Dr. C. Atkinson (Universität Kaiserslautern)
  gung verschiedener Organisationen an europäischen         •   Dr. G. Böckle (Siemens AG)
  Projekten wie z.B. PRAISE und ESAPS zeigt.                •   Dr.-Ing. K. Czarnecki (DaimlerChrysler AG)
                                                            •   Prof. Dr. U. Eisenecker (FH Kaiserslautern)
  Ziel des Workshops                                        •   Prof. Dr. E. Plödereder (Universität Stuttgart)
                                                            •   Prof. Dr. W. Pree (Universität Konstanz)
  Der Workshop hat zum Ziel, einen Erfahrungsaus-           •   Prof. Dr. D. Rombach (Fraunhofer IESE)
  tausch zwischen Industrie und Forschung im Bereich        •   S. Thiel (Robert Bosch GmbH)
  der Software-Produktlinien und angrenzender The-          •   R. Trauter (DaimlerChrysler AG)
  menbereiche zu ermöglichen.                               •   Dr. M. Verlage (Market Maker Software AG)
In software product lines, they have these product-feature maps that
describe the commonalities and differences of the products with respect
to their features as a table.
At that time, there was a German professor, Gregor Snelting, who
introduced formal concept analysis in software engineering.
I taught formal concept analysis as part of my reengineering class.
Concept analysis allows you to analyze such tables. In mathematical
terms, concept analysis is a technique to analyze the structure of
arbitrary binary relations.
We proposed in that German workshop to use concept analysis to analyze
such product-feature maps in software product lines.
I will describe it later in more detail.
However, we were more interested in program analysis than in
requirement engineering.
Another problem they have in product lines is to identify the components
necessary to implement a feature, which is needed to identify re-usable
components to be used in product lines.
So we decided to use formal concept analysis to search where features are
implemented in the code.
We submitted a paper describing this idea to CSMR.




        Derivation of Feature Component Maps by means of Concept Analysis                                                                and components and, hence, into feasibility and costs            component map and Section 4 describes our experience                            the intent of c, denoted by intent(c).                                                    C1                                                 1. cohesive modules and subsystems as defined and doc-              nario or an invoked feature, respectively. If composite             element.                                                         because the relationship was derived only from a specific                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     component c for which γ(c) = C holds). 21 of the concepts              names of the features correspond to the objects drawn via     resulting lattice contained 55 concepts, most of them intro-          to cause interferences by invoking irrelevant features. For           our terminology – a set of usage scenarios) is identi-            are required to implement a particular feature and is            to get an execution trace for each feature. A more sophisti-                 544-554, May 1999
                                                                                                                                         of different alternative product family platforms. The           with this technique in a case study. Section 5 discusses                           Informally, a concept corresponds to a maximal rectan-                                        C2                                             umented by the system’s architects or re-gained by re-          components are used for concept analysis, the execution          • A feature, f, is specific to exactly one component, c, if          implementation.                                                          pw_arcbox

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        draw_arcbox

arcbox_drawing_selected
erase_box_lengths

init_box_drawing

box_drawing_selected                   5
create_lineobject

line_drawing_selected
resizing_poly

elastic_poly

regpoly_drawing_selected
set_latestarc

redisplay_arc

last_arc

add_arc
otDrawString

draw_shift_mousefun_canvas

clear_mousefun_kbd

draw_mousefun_kbd

resizing_ebr

elastic_ebr

ellipsebyradius_drawing_selected           44
resizing_cbd

elastic_cbd

circlebydiameter_drawing_selected                           42
resizing_ebd

elastic_ebd

ellipsebydiameter_drawing_selected        43
esizing_cbr

elastic_cbr

circlebyradius_drawing_selected
do not introduce any new component and merely merge                    the panel in Figure 5; e.g., draw-ellipse-radius means that   duce no new component. We observed that the related                   instance, Xfig uses a balloon help facility that pops up a             fied that will invoke a feature.                                   needed at an early stage within a process toward a product       cated environment would allow to start and end recording                 [3] Brandenburg, F.J., ‘Graphlet’, Universität Passau,
set_latestspline




                                                                                                                                         knowledge gained from the feature component map                  related research.                                                               gle of filled table cells modulo row and column permuta-                                                                                         engineers; modules and subsystems will be consid-               trace containing the required low-level components                 c is the only component on all paths from µ(f) to the                The information described above can be derived by a                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       functionality needed by several superconcepts.                         an ellipse was drawn where the radius was specified (as        shapes, i.e., the variants of splines, circles, ellipses, etc.,       little window when the cursor stays some time on a sensi-                                                                               family platform                                                  traces at any time.                                                          http://www.infosun.fmi.uni-passau.de/Graphlet/.
list_add_arc           check_cancel
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                draw_spline




                               Thomas Eisenbarth, Rainer Koschke, Daniel Simonhe excluding input set E is identified that will not
compute_direction      textsize
boxsize_msg                      draw−rectangle.mon                                                                            redisplay_spline
compute_arccenter      pw_text
resizing_box                     draw−polyline.mon                                                                             last_spline




elastic_box                      draw−polygone.mon


add_spline
draw_arc

create_arc
lookfont

set_latesttext

                                                                                                                                         and additional economic considerations may lead to a                                                                                             tions. For example, Table 2 contains the concepts for the                                                                                       ered composite components in the following;                     induces an execution trace for composite components by                                                                               tool and fed back to the product family expert. As soon as                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       The first interesting observation is that concepts with             opposed to the diameter).                                     were merged at the top of the lattice since they use almost           tive area of the GUI (e.g., over the button selecting the cir-                                                                                                                                              Our implementation only counts subprogram calls and
list_add_spline




anfora, G., Cimitile, A., De Lucia, A., and Di Lucca, G.A.,
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             arc_bound              in_text_bound




bottom element (i.e, c is the only component requiredinvoke a feature.                                                  • to weigh alternative platform architectures,
create_spline





arc_drawing_selected   text_search
create_sfactor
redisplay_text
spline_bound
toggle_textmarker
spline_drawing_selected




                                                                                                                                         further selection of only a certain subset of all features       2. Concept Analysis                                                             relation in Table 1.                                                                                                                                                                                            replacing each low-level component with the composite                                                                                a decision is made re-use certain features, all components
pw_curve




                  University of Stuttgart, Breitwiesenstr. 20-22, 70565 Stuttgart, Germanymany components can be found in the upper region, while                                                                              the same components. In order to reduce the size of the               cle drawing mode). Sometimes the balloon help                                                                                                                                                            ignores accesses to global variables and single statements                   ‘A Case Study of Applying an Eclectic Approach to Identify
last_text




                                                                                                                                                                                                                                                                                                                                                                                           C6
make_sfactors                                           compute_angle                                                                                                                                                  set_latestellipse




                                                                                                                                                                                                                                                                                                                                                                                                                                       2. physical modules, i.e., modules as defined by means                                                                                 to implement feature f).
add_text
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             set_latestline                                                                                                                                                                                                                                                                                            redisplay_ellipse




he program is executed twice using I and E sepa-                  • to aim further tasks – like quality assessment – to only
list_add_text
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             redisplay_line                                                                                                                                                                                                                                                                                            center_marker
x_fontnum




bjects in Code’, Workshop on Program Comprehension, pp.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             last_line                                                                                                                                                                                                                                                                                                 last_ellipse
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    draw_text




                                                                                                                                         and their corresponding components.                                                                                                                                                                                                                                                                                                                              component to which it belongs. Hence, each system run                                                                                required for these features (easily derived from the con-
free_points




                                                                                                                                                                                                                                                                                                                                                                              C7                                                          of the underlying programming language or simply
add_line





list_add_line
elastic_moveline




new_string
add_ellipse

list_add_ellipse
in the lower region, the number of components decreases                                                                              lattice, we selected one representative among the related             mechanism triggers, introducing interferences between                                                                                                                                                    or expressions. It might be useful to analyze at a finer
                         {eisenbts, koschke, simondl}@informatik.uni-stuttgart.de
cancel_line_drawing                                                                                                                create_text




                                                                                                                                                                                                                                                                                                             ({o1, o2, o3, o4}, ∅)                                                                                                                                                                                                                                         • Features, to which two components, c1 and c2, jointly                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           rately.                                                              those existing components that are needed to populate
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             draw_line                                                                                                                                                                                                                                                                                                 create_ellipse
text_bound



ittsburgh, 1999, IEEE Computer Society Press.
get_intermediatepoint
ellipse_bound




                                                                                                                                                                                                             Concept analysis is a mathematical technique that pro-                               C1
create_line                                                                                                                                                                                            erase_char_string
init_trace_drawing




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yields all required components for a single scenario that                                                                            cept lattice) form a starting point for further analyses to                                                                                                                                                                                                                              7                                                                                                                                                                                                                                                                   and the number of interferences increases (an interference                                                                           shapes and re-run the experiment with three shapes                    features. Such effects affect the analysis because they                                                                                                                                                  granularity when subprograms are interleaved, i.e., differ-
line_bound                                                                                                                                                                                                                                                                                                draw_ellipse
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    char_handler




                                                                                                                                      4. The selected components are more closely analyzed,                                                                                                                                                                                                                                               directly available as existing files (the distinction to
draw_char_string




                                                                                                                                                                                                                                                                                                                                                                            Figure 2. Concept lattice for Table 1.                                                                                                                                                           contribute, can be identified by γ(c1) ∧ γ(c2); graphithe platform architecture,
finish_text_input
unconstrained_line




y comparison of the two resulting execution traces,
text_drawing_selected




                                                                                                                                                                                                          vides insights into binary relations. The mathematical                                  C2         ({o2, o3, o4}, {a3, a4})                                                                                                                                                                     exploits one feature. Thus, a single column in the relation                                                                          investigate quality (like maintainability, extractability, and
elastic_line





mouse_balloon

print_to_file

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            leads to an unstructured lattice; a lattice is said to be struc-                                                                     (ellipse, polygon, and open approximated spline). The                 introduce spurious connections between features. Fortu-                                                                                                                                                  ent strands of control with different functionality are                  [5] Chen, K. und Rajlich, V., ‘Case Study of Feature Location
                           Abstract                                tecture.                                                              for instance, with respect to maintainability, extract-                                                                                                                                                                    tion can be visualized in a more readable equivalent way              cohesive modules is that one does not know a priori                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       XRotDrawAlignedImageString




sing Dependence Graph’, Proc. of the 8th Int. Workshop on
the components can be identified that implement the                 • and to decide on further steps, like reengineering or
erase_lengths




                                                                                                                                                                                                          foundation of concept analysis was laid by Birkhoff in                                                                                                                                                                                                                                          table can be obtained per system run. Applying all usage           cally depicted, one ascertains in the lattice the closest         integrability) and to estimate effort for subsequent steps                                                                                                                                                                                                                                                                                                                                                                                                    see Figure 6                                                                                   tured if it can be decomposed into independent sublattices                                                                           resulting lattice is shown in Figure 7.                               nately, this problem can be partly fixed by providing a spe-                                                                                                                                              united in a single subprogram, possibly for efficiency rea-
                                                                                                                                         ability, and integrability.                                                                                                                                                                                                by marking only the graph node with an attribute a ∈ A                whether physical modules really group cohesive dec-
append_point




                                                                       One important piece of information for a product famrogram Comprehension, pp. 241-249, June 10-11, 2000,
create_point
clip_arrows




                                                                                                                                                                                                          1940. It has already been successfully used in other fields                              C3         ({o1}, {a1, a2})                                                                                                                                                                                                                                                common node toward the top element starting at thefeature.                                                             wrapping.
   Feature component maps describe which components                                                                                                                                                                                                                                                                                                                 whose represented concept is the most general concept                 larations; physical modules are the unscrutinized               scenarios provides the relation table.                                                                                               (wrapping, reengineering, or re-development from                                                                  node                                                                                                                   altlength_msg

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          concept           that are connected via the top and bottom elements only).                                                                                                                                                  cific scenario in which only the accidentally invoked                                                                                                                                                     sons. For instance, we have found a subprogram in our                        Limerick, Ireland, IEEE Computer Society Press.
                                                                   ily analysis that tries to integrate existing assets is the so-    5. A product family platform is designed. Alternatives              of software engineering. The binary relation in our specific                                                                                                                                                                                                                                                                                                        nodes to which c1 and c2, respectively, are attached;


{arrow_bound}                mode_balloon                            node
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        length_msg

draw_mousefun_topruler




are needed to implement a particular feature and are used                                                                                                                                                                                                                                         C4         ({o2, o4}, {a3, a4, a5})                                                                                                     result of a programmer’s way of grouping declara-                  An execution trace can be recorded by a profiler. How-                                                                             scratch).


{create_mouse

hat is to say that there are many specific operations and                                                                                                                                                  irrelevant feature is invoked, which leads to a refactored            Wilde and Scully focus on localizing rather than deriv-               The technique presented in this paper yields the feature     case study that draws different kinds of objects. The func-
                                                                   called feature component map that describes which com-                for components to populate the product family plat-              application of concept analysis to derive the feature com-                                                                                                that has a in its intent. Analogously, a node will be marked                                                                                                                                                                                                                                                                                                                                                                                                      cmd_balloon}




he taller a concept is, the moreraudejus, H., Implementing a Concept Analysis Tool for
early in processes to develop a product family based on                                                                                                                                                                                                                                                                                                                                                                                                                                                   ever, most profilers only record subprogram calls but not           all features at and above this common node are thosefew shared operations and also that shared operations are                                                                                                                                                  concept lattice that contains a new concept that isolates the     ing required components: For deriving all required compo-             component map automatically using the execution traces           tion contained a large switch statement whose branches
                                                                                                                                                                                                                                                                                                                                                                    with an object o ∈ O if it represents the most special con-           tions whether it makes sense or not);
{setup_ind_panel




                                                                   ponents are needed to implement a particular feature. A               form are weighed: component extraction and reengi-                                                                                                       C5         ({o3, o4}, {a3, a4, a6, a7, a8})                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   set_line_stuff

components it contains.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Identifying Abstract Data Types in C Code, master thesis,
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               3.3. Implementation
set_cursor




existing components. This paper describes a new tech-                                                                                                                                                     ponent map states which components are required when a                                                                                                                                                                                                                                          accesses to variables. Instead of using a symbolic debug-          jointly implemented by these components.
create_bitmaps


igure 4. Lattice for the first experiment                                                                                                                                                                                                                                                                    really used for many features.                                                                                                                                                                             irrelevant feature and its components. In our example,            nents, the execution trace for the including input set is             for different usage scenarios. The technique is based on         drew the specific kinds of objects. In the execution trace,
                                                                                                                                                                                                                                                                                                                                                                    cept that has o in its extent. The unique element µ in theniversity of Kaiserslautern, Germany, 1998.
process_pending

redisplay_zoomed_region




                                                                   feature is a realized (functional as well as non-functional)          neering, new development, integration of COTS, or                feature is invoked. This section describes concept analysis                             C6         ({o4}, {a3, a4, a5, a6, a7, a8})                                                                                           3. subprograms, i.e., functions and procedures, and glo-                                                                                                                                                                                                                                                                                                                                ...

main

nique to derive the feature component map and additional                                                                                                                                                                                                                                                                                                            concept lattice marked with a is therefore:                                                                                           ger, for example, that allows to set watchpoints on variable     • Components jointly required for two features, f1 and                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Concept #1 in Figure 4 is the largest concept (exclud-                                                                                       1
interferences due to an accidentally invoked irrelevant fea-      sufficient. By subtracting all components in the execution             concept analysis, a mathematical sound technique to ana-         this subprogram showed up for all objects where in fact                  [7] Lindig, C. and Snelting, G., ‘Assessing Modular Structure of
                                                                   requirement (the term feature is intentionally weakly                 wrapping.                                                        in more detail.                                                                                                                                                                                                                  bal variables of the system; subprograms and global                                                                                                                                                    The implementation of the described approach is sur-             firstly present a general overview of the results and sec-                                                                                                                                                                                                                                     scenarios. To identify all subprograms required for a sin-
                                                                                                                                                                                                                                                                                                             (∅, {a1, a2, a3, a4, a5, a6, a7, a8})                                                                                                                                                        accesses, or even to instrument the code if no sophisticateding the bottom element). It exploits a single feature “draw                                                                                                                                                ture appeared only at the two layers directly on top of the       trace for the excluding input set from those in the execu-            lyze binary relations, which has the additional benefits to       only specific parts of it were actually executed.
                                                                                                                                                                                                                                                                                                                                                                                         ∨ { c ∈ L(C ) a ∈ intent ( c ) }
dependencies utilizing dynamic information and conceptf2, are described by µ(f1) ∨ µ(f2); graphicallyegacy Code Based on Mathematical Concept Analysis’,
                                                                   defined because its exact meaning depends on the specific            6. A migration plan is prepared.                                       Concept analysis is based on a relation R between a set                                                                                                                                                                       variables will be called low-level components in the                                                                                                                                                prisingly simple (if one already has a tool for concept
analysis. The method is simple to apply, cost-effective,                                                                                                                                                                                                                                                                                                                       µ(a) =                                           (1)                                                                       profiler is available, one can also use a simple static                                                                                                                                                ondly go into further details for particular interesting                                                                                                                                                                                                                                         gle feature or a set of features, one can then analyze the                                                                                                                                 text object”. According to the lattice, the feature is largely                                                                                                                                             bottom element of the lattice, and could be more or less          tion trace for the invoking input set, only those compo-              reveal not only correspondences between features and                Furthermore, the success of the described approach                        Proc. of the Int. Conference on Software Engineering, pp.
                                                                   context). Components are computational units of a soft-                                                                                                                                                                                Table 2: Concepts for Table 1.                                                                                                   following.                                                                                                                        depicted, one ascertains in the lattice the closest com-
largely language independent, and can yield results                                                                                     The technique described in this article is used to derive         of objects O and a set of attributes A, hence R ⊆ O × A.                                                                                                     The unique element γ marked with object o is:                                                                                      dependency analysis: One considers all variables directly
analysis). Our prototype for a Unix environment is an            observations.                                                                                                                                                                                                                                                                                    concept lattice as described in Section 3.2.                                                                                                                                               independent from other features and shares only a few                                                                                                                                                      ignored.                                                          nents remain that specifically deal with the feature.                  components but also dependencies between features and            heavily depends on the clever choice of usage scenarios                      349-359, Boston, 1997.
                                                                   ware architecture (see Section 3.1). Because the feature                                                                                   The tuple C = (O, A, R) is called formal context. For a                        The set of all concepts of a given formal context forms                                                                                     Ideally, one will use alternative (1) when reliable and                                                                             mon node toward the bottom element starting at the                opportunistic integration of the following parts:
                                                                                                                                                                                                                                                                                                                                                                                         ∧ { c ∈ L(C ) o ∈ extent ( c ) }
quickly and very early in the process.                                                                                               the feature component map which plays a central role                                                                                                                                                                                                                                                                                                                 and statically accessed for each executed subprogram also                                                                                                                                                Xfig is a menu-driven tool that allows the user to draw                                                                                                                                                                                                                                       First experiment. In our first experiment, we prepared 15                                                                                                                                   components with other features.                                                                                                                                                                                                                                                  Note that our technique achieves the same effect by               between components (feature-feature dependencies are             and the combination of them. Scenarios that cover too                    [8] Lindig, C., Concepts,
                                                                   component map is needed very early to trade off alterna-                                                                               set of objects, O ⊆ O, the set of common attributes, σ, is                      a partial order via:                                                                 γ (o) =                                          (2)   complete documentation exists. However, if cohesive                                                                                    nodes to which f1 and f2, respectively, are attached; all          • Gnu C compiler gcc to compile the system using aftp://ftp.ips.cs.tu-bs.de/pub/local/softech/misc.
                                                                                                                                     early in this process.                                                                                                                                                                                                                                                                                                                                               to be dynamically accessed (all transitively accessed vari-                                                                                                                                           and manipulate objects interactively under the X Window                                                                                                                                                                                                                                          scenarios. Each scenario invokes Xfig, performs the draw-                                                                                                                                       Concept #5 represents the two features “draw polyline”                                                                                                 4
elated Research                                               considering several execution traces for different features           derived from an existing system and, hence, may only             much functionality in one step or the clumsy combination
                                                                   tives in good time, complete and hence time-consuming                                                                                  defined as:                                                                                                                                                                                                                  modules and subsystems are not known in advance, one                                                                                   components at and below this common node are those                   command line switch for generating profiling infor-
1. Introduction                                                    reverse engineering of the system is out of the question. In      Overview. The technique described here is based on the                                                                                                      ( O 1, A 1 ) ≤ ( O 2, A 2 ) ⇔ O 1 ⊆ O 2 or equivalently with          We will call a graph representing a concept lattice                                                                                ables will automatically be considered because all exe-                                                                                                                                               System. Objects can be lines, polygons, circles, rectangles,                                                                                                                                                                                                                                     ing of one of the objects Xfig provides, and then termi-                                                                                                                                    and “draw polygon”. The only difference between these                                                                                                                                                                                                                        at a time. Components not specific to a feature will “sink”            exist for this particular system but not necessarily for these   of scenarios will result in huge and complex lattices that               [9] Koschke, R., ‘Atomic Architectural Component Recovery for
                                                                                                                                                                                                                                                                                                                                                                                                                                      would hardly make the effort to analyze a large system to                                                                              jointly required for these features.                                 mationhe mathematical foundation of concept analysis was                                                                                                                                                                                                                               Program Understanding and Evolution’, Dissertation, Institut
                                                                                                                                                                                                                  σ ( O ) = { a ∈ A ∀( o ∈ O ) ( o, a ) ∈ R }                                                                                                       using this marking strategy a sparse representation. The                                                                              cuted subprograms are examined). In practice, this                                                                                                                                                    splines, text, and imported pictures. An interesting first                                                                                                                                                                                                                                       nates Xfig, i.e., the aspects above were not combined and                                                                                                                                   two features is that an additional line is drawn that closes a                                                                                                                                                                                                               in the concept lattice, i.e., will be closer to the bottom ele-       features in general).                                            are unreadable for humans. Moreover, the number of
                                                                   particular, the decision for a certain alternative will lead to   execution traces generated by a profiler for different usage                                                                                                 ( O 1, A 1 ) ≤ ( O 2, A 2 ) ⇔ A 1 ⊇ A 2 .                                                                                            obtain these in order to apply concept analysis to get thelaid by Birkhoff in 1940. Primarily Snelting has recently                                                                                                                                                                                                                             für Informatik, Universität Stuttgart, 2000,
    Developing similar products as product families prom-                                                                                                                                                                                                                                                                                                           equivalent sparse representation for Figure 2 is shown in                                                                             analysis may be a sufficient approximation. But one               • Components required for all features can be found at               • Gnu object code viewer nm and a short Perl script in          task in our case study was to define what constitutes a fea-                                                                                                                                                                                                                                                                                                                                                                                                                                polygon. This difference is not visible in the concept lat-                                                                                                                                                                                                                  ment. More precisely, recall from Section 3.2 that a com-                 The technique is primarily suited for functional fea-        usage scenarios increases tremendously when features are
                                                                   a consolidation on specific economically important core            scenarios (see Figure 1). One scenario represents the invo-              Analogously, the set of common objects, τ, for a set ofno other functionality of Xfig was used. We used all                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         http://www.informatik.uni-stuttgart.de/ifi/ps/rainer/thesis.
ises several advantages over relatively expensive separate                                                                                                                                                                                                                                   If c1 ≤ c2 holds, then c1 is called a subconcept of c2                 Figure 3. The content of a node N in this representation          feature component map because it not yet clear which                should be aware that it may overestimate references                the bottom element.                                                  order to identify all functions of the system (as             ture. Clearly, the capability to draw specific objects, like                                                                                                                                                                                                                                     shapes of Xfig’s drawing panel shown in Figure 5 except                                                                                                                                     tice since the two features are attached to the same con-                                                                                                                                                  introduced concept analysis to software engineering. Since        ponent, c, is specific to exactly one feature, f, if f is the          tures that may be mapped to components. In particular            combined.
                                                                   components in many cases and hence to an exclusion of             cation of one single feature and yields all subprograms              attributes, A⊆ A, is defined asrone, M. and Snelting, G., ‘On the Inference of Configura-
developments, like lesser costs and shorter time for devel-
                                                                                                                                     executed for this feature. These subprograms identify the                                                                                            and c2 is called superconcept of c1. For instance,                        can be derived as follows:                                        components are relevant at all and reverse engineering of           because variable accesses may be included that are on            • Features that require all components can be found at                 opposed to those included from standard libraries),           lines, splines, rectangles, etc., can be considered a feature                                                                                                                                                                                                                                    picture objects and library objects.                                                                                                                                                       cept. The distinction is made in the body of the function                                                                                                                                                  then it has been used to evaluate class hierarchies [15],         only feature on all paths from γ(c) to the top element.               non-functional features do not easily map to components.            In our case study, the method provided us with valuable
opment, test, and maintenance. These advantages are                less important components. Any investment in a deep and                                                                                        τ ( A ) = { o ∈ O ∀( a ∈ A ) ( o, a ) ∈ R }                                                                                                                                                                         the complete system first will likely not be cost-effective.                                                                                                                                                                                                               of Xfig. Moreover, one can manipulate drawn objects in                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 explore configuration structures of preprocessor state-                                                                                                                                                                                                                                tion Structures From Source Code’, Proc. of the Int. Confer-
                                                                   costly pre-analysis of less important components would be         components (or are themselves considered components)                                                                                                 ({o2, o4}, {a3, a4, a5}) ≤ ({o2, o3, o4}, {a3, a4}) is true in             • the objects of N are all objects at and below N,                                                                                   paths not executed at runtime, and it will also ignore refer-      the top element.                                                   • Gnu profiler gprof and a short Perl script to ascertain                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    that is called to draw either a polygon or a polyline. Con-                                                                                                                                                                                                                      One may argue that components that are only required              For example, for applications for which timing is critical       insights. The lattice revealed dependencies among features                   ence on Software Engineering, pp. 49-57, May 1994, IEEE
based on the fact that all family members share a common                                                                                                                                                     In Section 3.1, the formal context for applying concept                                                                                                                                                                  Only later, if the retrieved feature component map (using           ences to variables by means of aliases if the simple static                                                                                                                                           different edit modes (rotate, move, copy, scale, etc.) with                                                                                                                                                                                                                                                                                                                                                                                                                                 cept #3 denotes the feature “draw spline”. Concept #4 has                                                                                                                                                  ments [10, 14], and to recover components [4,7,12,13].            to get the system started, but are not – strictly speaking –          (because it may result in diverging behavior), the features      for the Xfig implementation and the absence of such                           Computer Society Press.
                                                                   in vain to a large degree. Instead, reverse engineering in        required for a certain feature. The required components for                                                                                          Table 2.                                                                   • the attributes of N are all attributes at and above N.                                                                                                                                              • If the top element does not contain features, then all               the executed functions in the execution trace,
infrastructure – also known as platform architecture. There                                                                                                                                               analysis to derive the feature component map will be laid                                                                                                                                                                   simpler definitions of components, like those in (2) or (3))         dependency analysis does not take aliasing into account.                                                                                                                                              Xfig. Hence, we considered as main features the following                                                                                                                                                                                                                                            circle by radius                                                                                                                                             circle by diameter         no feature attached and represents the components shared                                                                                                                                                       For feature localization, Chen and Rajlich [5] propose a      directly necessary for any feature will still appear in the           would also have to take time into account.                       dependencies, respectively; e.g., the abilities to draw text
                                                                   early phases should give information on the feature com-          all scenarios and the set of features are then subject to con-                                                                                          The set of all concepts of a given formal context and                                                                                                                                                                                                                           components in the top element are superfluous (such                 • concept analysis tool conceptserry, D., ‘Generic Architecture Descriptions for Product
are many approaches to newly developing product families                                                                                                                                                  down as follows;                                                                                                                                             For instance, the node in Figure 3 marked with o2 and          clearly shows which lower-level components should be                For a first analysis to obtain a simplified feature compo-                                                                                                                                              two capabilities:                                                                                                                                                                                                                                                                                                                                                                                                                                                                           for drawing polygons, polylines, and splines. These com-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  6
semi-automatic method, in which an analyst browses the            concept lattice when we do not subtract execution traces                  Note also that the technique is not suited for features      and circles/ellipses are widely independent from other
                                                                   ponent map quickly and with simple means. To this end,            cept analysis. Concept analysis gives information on rela-                                                                                           the partial order ≤ form a complete lattice, called concept                                                                                 investigated further to obtain composite components,                                                                                   components will not exist when the set of objects for                                                                                                                                                                                                                                                                                                                                                                                  ellipse by radii                                                                                                                                            ellipse by diameters                                                                                    Figure 6. Relevant parts for                                                                                                                                                                                                                                                                                                                                                                                         Lines’, Proc. of the Second International ESPRIT ARES
from scratch [2, 11]. However, according to Martinez [16],                                                                                                                                                 • components will be considered objects,                                                                                                                 a5 is the concept ({o2, o4}, {a3, a4, a5}).                                                                                           nent map, one can also ignore variables and come back to                                                                              • graph editor Graphlet [3] to visualize the concept lat-        1. ability to draw different shapes (lines, curves, rectan-                                                                                                                                                                                                                                                                                                                                                                                                                                ponents are no real drawing operations but operations to                        circles and ellipses                                                                                                       statically derived dependency graph; navigation on that           for an excluding input set. It is true that these components          that are only internally visible, like whether a compiler        shapes. Related features were grouped together in the con-                   Workshop, Lecture Notes in Computer Science 1429, pp. 51-
                                                                   the product line analyst imparts all relevant features, for       tionships between features and required components as                                                                                                lattice L:                                                                                                                                  reverse engineering may generally pay off (in order to                                                                                 concept analysis contains only components executed                   tice,                                                                                                                                                                                                                                                                                                                                                          closed approx. spline                                                                                                                                       approximated spline                                                                                                                                                    Figure 7. Concept lattice for second experiment.
graph is computer-aided. Since the analyst more or less                                                                                                                                                                                                                               56, Springer, 1998
most successful examples of product families at Motorola                                                                                                                                                                                                                                                                                                                                 a3, a4                                                                                                           these in a later phase using more sophisticated dynamic or                                                                                                                                                gles, etc.)                                                                                                                                                                                                                                                                                                                                                                                                                                                                             keep a log of the points set by the user and to draw lines                 Nodes #41, #42, #43, and #44 represent the features to                                                                                                                                            cannot be distinguished from components that in fact con-             uses a certain intermediate representation. Strictly speak-      cept lattice, which allowed us to compare our mental
                                                                   which the necessary components need to be detected, to            well as feature-feature and component-component depen-                • features will be considered attributes,                                                                      O     A                                                                                                     detect cohesive modules, we have developed a semi-auto-                                                                                at least once, which is the case if a filter ignores allhis lattice consists of 22 concepts, three of them pro-          takes on all the search, this method is less suited to quickly
originated in a single separate product. Only in the course
                                                                                                                                     dencies.                                                                                                                                                    L(C) = { ( O, A ) ∈ 2 × 2          A = σ(O) ∧ O = τ( A)}            a1, a2                                                                                                                               static analyses.                                                   subprograms for which the profiler reports an execu-                • and two more short Perl scripts to convert the file for-                                                                                                                                                                                                                                                                                                        closed interpol. spline                                                                                                                                             interpolated spline    between set points while the user is still setting points (a           draw circles and ellipses using either diameter or radius.                                                                                                                                            tribute to all components because both kinds of compo-                ing, internal features may be viewed as implementation           model of a drawing tool to the actual implementation of                  [12]Sahraoui, H., Melo. W, Lounis, H., and Dumont, F. (1997),
                                                                   the reverse engineer who in turn delivers the feature com-                                                                              • a pair (component c, feature f) is in relation R if c is                                                                                                              a5              a6, a7, a8                         matic method integrating many automatic state-of-the-art                                                                                                                                                                                                                   2. ability to modify shapes in different editing modesvide the specific functionality for the respective shapes.             and cheaply derive the feature component map. Moreover,                                                                                                                                                                                                                               ‘Applying Concept Formation Methods to Object Identifica-
of time, a shared architecture for a product family evolvedmats of concepts and Graphlet (all Perl scripts                                                                                                                                                                                                                                                                                                                                           polygon                                                                                                                                        polyline         spline first appears as polygon and is only re-shaped when              They all contain three specific components to draw the                                                                                                                                                 nents jointly appear at the bottom element. However, the              details. However, such implementation details may be of          Xfig. The lattice also classified components according to
                                                                   ponent map. On the basis of the feature component map                                                                                     executed when f is invoked.                                                     The infimum of two concepts in this lattice is com-                        o1          o2              o3                                 techniques [9]).                                                                                                                       tion count of 0).                                                                                                                      (rotate, move, copy, scale, etconcept #1 (21 functions) depicts the functionality for                                                                                                                                                                                                                                                                                                     tion in Procedural Code’, Proc. of the Conference on Auto-
Moreover, large investments impose a reluctance against                                                                               feature F                                                                                                                                                                                                                                                                                                                                                           3.2. Interpretation of the Concept Lattice                                                                                              together have just 147 LOC).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        rectangular box       the user has set all points).                                          object, to plot an elastic bend while the user is drawing,                                                                          the method relies on the quality of the static dependency         idea of an excluding input set can be taken over to our               interest for defining a product family architecture. Internal     their abstraction level, which is a useful information for
                                                                   and additional economic reasons, a decision is made for                                                                                                                                                                puted by intersecting their extents as follows:                                                     orectangular box                                                                                                                                                                                                                                                                                                                splines and concept #2 (17 functions) represents the one                                                                                                                                                                                                                                                                                                    mated Software Engineering, Nevada, pp. 210-218,
introducing a product family approach that ignores exist-                                                                                  usage scenario                                                     However, here – for the time being – we will use as an                                                                                                                                       <                             Alternative (2) can be chosen if suitable documentation                                                                           • If the bottom element does not contain any compo-
he fact that the subprograms are extracted from the
e conducted two experiments. In the first one, we                                                                                                                                                                                                                                                                                                                                                                                                                with rounded corners       Concept #2 stands for the feature “draw arc” and con-              and to resize the object. Note the similarity of the compo-                                                                         graph. If this graph, for example, does not contain infor-        technique to distinguish these two kinds of components by             features can only be detected by looking at the source,          re-use; general components can be found at the lower                         November, IEEE Computer Society.
ing assets. Hence, an introduction of a product family             particularly interesting and required components, and fur-
                                                                                                                                                                                                          abstract example the binary relation between arbitrary                                 ( O 1, A 1 ) ∧ ( O 2, A 2 ) = ( O 1 ∩ O 2, σ ( O 1 ∩ O 2 ) )                                                                         is not available but there is reason to trust the program-             Concept analysis applied to the formal context                  nent, all features in the bottom element are not imple-                                                                            investigated the ability to draw different shapes only. In                                                                                                                                                                                                                                         regular polygon                                                                                                                                                            arc
for lines (used for polygons). Both are dependent on con-             mation on potential values of function pointers, the human
                                                                                                                                                execution trace                                                                                                                                                                                                        Figure 3. Sparse representation of Figure 2.                                                                                                                                                                                                                            object code makes the implementation independent from                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        cept #7 is again a concept that represents shared compo-               nent names. The specific commonalities among circles and                                                                                                                                               providing a usage scenario in which no feature is invoked,            because it is not clear how to invoke them from outside          level, specific components at the upper level. Moreover,
                                                                   ther expensive analyses regarding quality can be cost-                                                                                                                                                                                                                                                                                                             mers of the system to a great extent. In all other cases, one       described in the last section gives a lattice, from which          mented by the system (this constellation will not                                                                                  the second one, we analyzed the ability to modify shapes.                                                                                                                                                                                                                                           picture object                                                                                                                                                           text                                                                                                                                                cept #4 (29 functions) that groups functions related to               analyst may miss functions only called via function point-                                                                                                                                                                                                                        [13]Siff, M. and Reps, T., ‘Identifying Modules via Concept
approach has generally to cope with existing code.                                                                                                 required components C1 …Cn                             objects and attributes shown in Table 1. An object oi has                         The infimum describes a set of common attributes of                                                                                                                                                                                                                                                                                                 the programming language to a great extent (as long as the                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   nents for drawing elastic lines while the user is setting              ellipses are represented by node #38, which introduces the                                                                                                                                            like simply starting and immediately shutting down the                and how to derive from an execution trace whether these          the lattice showed dependencies among components,                            Analysis’, Proc. of the Int. Conference on Software Mainte-
                                                                   effectively aimed at selected components.                                                                                                                                                                                                                                                                                                                          will fall back on alternative (3). However, for alternative         interesting relationships can be derived. These relation-          exist, if there is a usage scenario for each feature and                                                                           The second experiment exemplifies combined features                                                                                                                                                                                                                                                                                                                                                                                                                    library object
points. Concept #3 (20 functions) denotes the ellipse fea-            ers. At the other extreme, if the too conservative assump-
    Reverse engineering may help creating a product fam-                                                                                                                                                  attribute aj if row i and column j is marked with an ! in                       two sets of objects. Similarly, the supremum is deterpoints. The difference between concept #7 and concept #4               shared components to draw circles and ellipses (both spec-                                                                                                                                            system without invoking any relevant feature. That simple             features are present or not. However, we assume that             which need to be known when components are to be
                                                                       This paper describes a quickly realizable technique to                           (F, C1), …(F, Cn) ∈R                                                                                                                                                                                        3. Feature Component Map                                          (3), concept analysis may additionally yield hints on sets          ships can be fully automatically derived and presented to          every usage scenario is appropriate and relevant to the           language is compiled to object code) and has the advan-          composed by basic features. For the second experiment, ature, concept #5 (29 functions) the general drawing sup-                                                                                                                                                                                                                                                                                                    nance, Bari, pp. 170-179, October, 1997, IEEE Computer
ily for existing systems by identifying and analyzing the                                                                                                                                                                                                                                 mined by intersecting the intentsis that the former only contains the components to draw                ified by diameter and radius).                                                                                                       tion is made that every function whose address is taken is        trick separates the two kinds of components in two distinct           externally visible features are generally more important.        extracted.                                                                   Society.
                                                                   ascertain the feature component map based on dynamic                                      concept analysis                             Table 1 (the example stems from Lindig and Snelting [7]).                                                                                                                                                                                                                                       the analyst such that the more complicated theoretical             system; a system may indeed not have all features,                tage that no front end is necessary. On the other hand,                                                                                                                                                                                                                                                                                                                                                      Figure 5. Xfig’s object shapes.                                                                                                                                                                                                                                                       port functionality and concept #6 (123 functions) the start-
components and also by deriving the individual architec-                                                                                                                                                                                                                                                                                                                                                                              of related subprograms forming composite components.                                                                                                                                                                                                                      shape is drawn and then modified. Both draw and modify                                                                                                                                                                                                                                                                                                                                                                                                                                       the elastic line, while the latter adds the capability to set an           Nodes #32 and #39 connect the circles and ellipses to                                                                           called at each function pointer call site, the search space       concepts, C1 and C2, in the lattice where C1 < C2 and                     The invocation for externally visible features is com-          As future work, we want to explore how results
                                                                   information (gained from execution traces) and concept                                          feature component map                  For instance, the following equations hold for this table,                             ( O 1, A 1 ) ∨ ( O 2, A 2 ) = ( τ( A 1 ∩ A 2), A 1 ∩ A 2 )            In order to derive the feature component map via con-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          background can be hidden. The only thing an analyst has            i.e., a usage scenario may be meaningless for a given             because a compiler may replace source names by link                                                                                                                                                                                                                                                                                                                   The resulting lattice for this experiment is shown in                                                                                                                                                                                                                                                                       up and initialization code of the system.                                                                                                                                                                                                                                                                                                               [14]Snelting, G., ‘Reengineering of Configurations Based on
ture from each system. These individual architectures may                                                                                                                                                                                                                                                                                                                                                                                The relation for the formal context necessary for con-                                                                                                                                                                                                                 constitute a basic feature. Combined features add to the                                                                                                                                                                                                                                                                                                                                                                                                                                    arbitrary number of points. Splines do not need this capa-             the other objects. No components are attached to nodes                                                                              increases extremely. Generally, it is statically undecidable                                                                            paratively simple when a graphical user interface is avail-      obtained by the method described in this paper may be                        Mathematical Concept Analysis’, ACM Transactions on Soft-
                                                                   analysis. The technique is automatic to a great extent.                                         and dependencies                       also known as relation table:                                                                                                                             cept analysis, one has to define the formal context
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          to know is how to interpret the derived relationships. This                                                                          names in the object code (for instance, C++ compilers use                                                                                                                                                                                                                                                                                                         Figure 4. The contents of the concepts in the lattice are                                                                                                                                                                                                                                                                           Analyzing concepts #1, #2, and #3, we found that the                                                                                C1= ⊥ and C2 contains only those components that are
then be unified to a platform architecture and the derived                                                                                                                                                                                                                                      The supremum ascertains the set of common objects,                                                                                     cept analysis is defined as follows:                                                                                                    system).                                                                                                                           effort needed to derive the feature component map as there                                                                                                                                                                                                                                                                                                                                                                                                                                  bility because they are defined by exactly three points.                #32 and #39, they only merge components from different                                                                              which paths are taken at runtime, so that every static anal-                                                                            able (as it was the case in our case study). Then, usually       combined with results of additional static analyses. For                     ware Engineering and Methodology 5, 2, pp. 146-189, April,
                                                                   Concept analysis is a mathematical technique to investi-                                                                                          σ ( { o 1 } ) = { a 1, a 2 } and τ ( { a 7, a 8 } ) = { o 3, o 4 }
                                                                                                                                                                                                                                                                                                                                                                    (objects, attributes, relation) and to interpret the resulting                                                                                                                                                                                                             name mangling to resolve overloading) there is not always                                                                                                                                                                                                                                                                                                         omitted for readability reasons. However, their size in this                                                                                                                                                                                                                                                                    shapes provide individual rotate functions. In other words,                                                                             really required for all components in a narrower sense.
components may be used to populate the unified architec-                                                                                                 Figure 1. Overview.                                                                                                               which share all attributes in the intersection of two sets of                                                                                       (C, F) ∈ R if and only if component C is required           section explains how interesting relationships can be auto-        Beyond these relationships between components and                                                                                  are many possible combinationsysis will yield an overestimated search space, whereas                                                                                                                                                                                                                                1997.
                                                                                                                                                                                                                                                                                                                                                                    concept lattice accordingly.                                                                                                                                                                                                                                               a direct mapping from the subprograms in the execution                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Concept #6 represents the feature “draw lines” and is              concepts. The two nodes have a direct infimum (not shown                                                                                                                                                   Furthermore, our technique goes beyond Wilde and                  only a menu selection or a similar interaction is necessary.     example, we want to investigate the relation between the
                                                                   gate binary relations (see Section 2).                                                                                                                                                                                 attributes.                                                                                                                                                                                                     matically derived.                                                                                                                                                                                        In both experiments, we considered subprograms as                                                                                                                                                                                                                                            picture is a linear function of their number of components                                                                                                                                                                                                                                                                      the rotate feature is implemented specific to each shape,              dynamic analyses exactly tell which parts are really used                                                                                                                                                                                                                         [15]Snelting, G. and Tip, F., ‘Reengineering Class Hierarchies
ture. To this end, code needs to be adjusted, reengineered,                                                                             We want to point out that not all non-functional                                                                                                                                                                                                                                                      when feature F is invoked; a subprogram is                                                                                  features, further useful aspects between features on one
trace back to the original source. Because we dealt in our                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   used for drawing rectangles, polygons, and polylines, as               in Figure 6) and add the same components to the circle and                                                                                                                                                                                                                  In the case of a batch system, one may vary command line         concept lattice based on dynamic information and static
                                                                                                                                                                                                                         a1      a2      a3      a4      a5      a6      a7      a8                                                                                                                                                                                                                          As already abstractly described in Section 2, the fol-                                                                                                                                             components. However, in our simple implementation, we                                                                                                                                                                                                                                            (except for the bottom element that contains 136 compo-                                                                                                                                                                                                                                                                         i.e., there is no generic component that draws all different                                                                            Scully’s technique in that it also allows to derive relevant                                                                                                                                                        Using Concept Analysis’, Proc. of the ACM SIGSOFT Sym-
or wrapped. However, changing or wrapping the code is              Integration into a Product Family Process. A simple               requirements, e.g., time constraints, can be easily mapped                                                                                               Graphically, the concept lattice for the example relation             3.1. Context for Feature and Components                                   required when it needs to be executed; a global                                                                             hand and between components on the other hand may beone would expect. The generality of this feature becomes               ellipse features. The components inherited via these two                                                                            at runtime (though for a particular run only). However,                                                                                 switches and may have to provide different sets of test data     software architecture recovery techniques.
                                                                                                                                                                                                                o1       !       !                                                                                                                                                                                                                                                                        lowing base relationships can be derived from the sparse                                                                             case study with C code, object code names were identical         do not handle variable accesses. Hence, not all required                                                                                                                                                                                                                                         nents, mostly initialization and GUI code and very basic                                                                                                                                                                                                                                                                        shapes, which would have been an interesting finding in                                                                                  relationships between components and features by means                                                                                                                                                              posium on the Foundations of Software Engineering, pp. 99-
only done in very late phases in moving toward a product           process for feature-based reengineering toward product            to components, i.e., our technique primarily aims at func-                                                                                           in Table 1 can be represented as a directed acyclic graph                                                                                           variable is required when it is accessed (used or                                                                           derived:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          immediately obvious in the concept lattice as it is located            nodes are very basic components of the lowest regions of                                                                            Chen and Rajlich’s technique could be helpful in a later                                                                                to invoke a feature. However, in order to find suitable test
                                                                                                                                                                                                                o2                       !       !       !                                                                                                                                                                                                                                                representation of the lattice (note the duality in the inter-                                                                        to source names. If this is not the case, one either tolerates                                                                                                                                                                                                                                                                                                    functions, and was too large to be drawn accordingly; as a                                                                                                                                                                                                                                                                      terms of reuse.                                                                                                                         of concept analysis, whereas Wilde and Scully’s technique                                                                                                                                                           110, November, 1994.
family. Reverse engineering can also assist in earlier             families can be described as follows:                             tional features. However, in some cases, it is possible to                                                                                           whose nodes represent concepts and whose edges denote                        Components will be considered objects of the formal                    changed); a composite component is required when                                                                             • If γ(c1) < γ(c2) holds for two components c1 and c2,
low-level components are detected.                                                                                                                                                                                                                                                                                                                                                                                                                                                          in the middle level of the lattice.                                    the lattice, which indicates that ellipses and circles are                                                                          phase, in which the system needs to be more rigorously                                                                                  data, one might need some knowledge on internal details          References
                                                                                                                                                                                                                o3                       !       !               !       !       !                                                                                  context, whereas features will be considered attributes.                                                                              pretation):                                                                                                                          divergences between names (mostly, names are similar                 The resulting concepts contain subprograms grouped                                                                                                                                                                                                                                           comparison point: the text drawing concept, marked as                                                                                                                                                                                                                                                                                                                                                                                                                   only localizes a feature. The derived relationships are an                                                                                                                                                      [16]Staudenmayer, N.S. and Perry, D.E., ‘Session 5: Key Tech-
phases and, thus, Bayer et al. rightly demand an early inte-        1. The economically relevant features are ascertained by         isolate non-functional aspects, like security, in code and                                                                                           the superconcept/subconcept relation < as shown in                                                                                                  one of its parts is requiredhe framed area in Figure 4 has a simpler structure                widely separate from all other objects.                       General observations. We made the experience that                     analyzed. The purpose of our technique is to derive the                                                                                 of a system.
                                                                                                                                                                                                                o4                       !       !       !       !       !       !                                                                                  Note that in the reverse case, the concept lattice is simply                                                                                                                                             then component c2 requires component c1.                          enough) or has to reverse name mangling.                         together according to their usage for features. Note that the                                                                                                                                                                                                                                    node #1, has 29 components). As Figure 4 shows, there are                                                                                                                                                                                                                                                                                                                                                                                                               import information to product family experts and represent                                                                             [1] Bayer, J., Girard, J.-F., Würthner, M., Apel, M., and DeBaud,            niques and Process Aspects for Product Line Development’,
gration of reverse engineering into a product family                   product family engineers and market analysts.                 map them to specific components. For instance, one could                                                                                              Figure 2. The most general concept is called the top ele-                                                                                      In order to obtain the relation, a set of usage scenarios         • A component, c, is required for all features at andthan the rest of the lattice. This part deals with circles and                                                                       applying our method is easy in principle. However, run-               feature component map. It handles the system as a black                                                                                     The implementation of this technique was surprisingly            J.-M., ‘Transitioning Legacy Assets - a Product Line                     Proc. of the 10th International Software Process Workshop,
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           • If µ(f1) < µ(f2) holds for two features f1 and f2, then                                                                                                                                                                                                                                                                                                                                                                             a few concepts containing most of the components (i.e.,                                                                                                                                                                                                           Second experiment. In a second experiment, we analyzed                                                                                                                                                additional dependencies that need to be considered in a
approach [1]. Early reverse engineering is needed to derive                                                                          concentrate all network accesses in one single component                                  Table 1: Example relation.                                 ment and is denoted by . The most special concept is                      inverted but the derived information will be the same.            needs to be prepared where each scenario exploits prefera-             above γ(c) – as defined by (1) – in the lattice.                                                                                                                                                    more general subprograms can be found at the lower con-                                                                                                                                                                                                                                                                                                                                                                                                                                     ellipses and its details are shown in Figure 6. Each node,                                                                           ning all scenarios by hand is time consuming. It may be               box and, hence, does not give insights in internal aspects                                                                              simple. We opportunistically put together a set of publicly          Approach’, Proceedings of the SIGSOFT Foundations of                     June 1996, Ventron FR.
                                                                    2. The feature component map is derived based on the
                                                                                                                                                                                                                                                                                          called the bottom element and is denoted by ⊥ .                              The set of relevant features will be determined by the                                                                                                                                                                                                                  4. Case Study                                                    cepts in the lattice since they are used for many features,                                                                                                                                                                                                                                      subprograms) of the system. The lattice contains 47 con-                                                                                                                                                                                                          the edit mode rotate which comes in two variants: clock-                                                                                                                                              decision for certain features and components.
first coarse information on existing system components
                                                                       identified relevant features.
                                                                                                                                     to enable controlled secure connections.
                                                                                                                                                                                                             A pair (O, A) is called concept if A = σ ( O ) ∧ O = τ ( A )                                                                                                                                                             bly only one relevant feature. Then the system is used               • A feature, f, requires all components at and below µ(f)         feature f1 is based on feature f2.
cepts. 26 of them introduce at least one new component,
in Figure 6 contains two sets: The upper set contains all
wise and counterclockwise. The first ten shapes in
facilitated by the presence of test cases that allow an auto-         with respect to quality and effort.                                                                                                     available tools and wrote a few Perl scripts (140 LOC in             Software Engineering, Toulouse, pp. 446-463, Association of          [17]Wilde, N. and Scully, M.C., ‘Software Reconnaissance:
(assets) timely needed by a product family analyst to                                                                                   The remainder of this article is organized as follows.                                                                                                The combination of the graphical representation in                    product family experts. For components, we can consider                                                                                                                                                                                                                                                                                     while specific components are in the upper region of the                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         mated replay of various scenarios.                                        Wilde and Scully [17] also use dynamic analysis to                                                                                                                                                       Computing Machinery (ACM), 1999.
                                                                                                                                                                                                                                                                                                                                                                                                                                      according to the set of usage scenarios, one at a time, and            – as defined by (2) – in the lattice.                            One has to note that the latter relationship between fea-            As a case study, we analyzed the Xfig system [18]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         components attached to the node, i.e., those components,                                                                                                                                                                                                                                                                                           total) for interoperability, which took us just one day. A                                                                                    Mapping Program Features to Code’, Software Maintenance:
investigate feasibility and to estimate costs of different          3. The previously derived feature component map gives            Section 2 introduces concept analysis. Section 3 explains
                                                                                                                                                                                                          holds, i.e., all objects share all attributes. For a concept c =
                                                                                                                                                                                                                                                                                          Figure 2 and the contents of the concepts in Table 2                      the following alternatives depending on how much knowl-                                                                                                                                                                                                                                                                                     lattice. Hence, the concept lattice also reflects the level of                                                                                                                                                                                                                                   i.e., to these nodes, a component is attached (more pre-
c, for which γ(c) = N; the lower set contains all features of
igure 5 were drawn and rotated once clockwise and once
ecause Xfig has a GUI, running a single scenario by                localize features as follows:                                     6. Conclusions                                                        drawback of our simple implementation is that one has to         [2] Bosch, J., ‘Product-Line Architectures in Industry: A Case               Research and Practice, vol. 7, pp. 49-62, 1995.
                                                                                                                                                                                                          (O, A), O is the extent of c, denoted by extent(c), and A is                                                                                                                                                                the execution traces are recorded. An execution trace con-           • A component, c, is specific to exactly one feature, f, if     tures safely holds for the analyzed system only, i.e., this          (version 3.2.1) consisting of about 76 KLOCs written in                                                                                                                                                                                                                                                                                                           cisely, a concept C introduces a component if there exists a                                                                                                                                                                                                      counterclockwise, which resulted in 20 scenarios. The
alternative ways to get to a suitable product family archi-            additional insights into dependencies among features          how concept analysis can be used to derive the feature                                                                                               together form the concept lattice. The complete informa-                  edge on the system architecture is already available:                                                                                                                                                                                                                                                                                       abstraction of these subprograms within the given set of                                                                                                                                                                                                                                                                                                                                                                                                                                    N, including those inherited from other concepts. The                                                                                hand is an easy task. However, one has to pay attention not                                                                                                                                                   run the system for each usage scenario from the beginning            Study’, Proc. of the 21st International Conference on Soft-          [18]Xfig system, http://www.xfig.org.
                                                                                                                                                                                                                                                                                                                                                                                                                                      tains all required low-level components for a usage sce-               f is the only feature on all paths from γ(c) to the top      relationship is not necessarily true for the features as such,       the programming language C. In this section, we willhe invoking input set I (i.e., a set of test cases or – in      A feature component map describes which components                                                                                      ware Engineering (ICSE’99), (Los Angeles, CA, USA), pp.





                                                               1
Unfortunate y the rev ewers d d not ke the paper so much
t was accepted on y as a short paper
e took the comments of the rev ewers ser ous mproved the paper and
added more case stud es
t was a comp ete re-wr te but the essent a dea surv ved
        Derivation of Feature Component Maps by means of Concept Analysis
                               Thomas Eisenbarth, Rainer Koschke, Daniel Simon
                  University of Stuttgart, Breitwiesenstr. 20-22, 70565 Stuttgart, Germany
                         {eisenbts, koschke, simondl}@informatik.uni-stuttgart.de
                                                                                                                                         and components and, hence, into feasibility and costs
                                                                                                                                         of different alternative product family platforms. The
                                                                                                                                         knowledge gained from the feature component map
                                                                                                                                         and additional economic considerations may lead to a
                                                                                                                                         further selection of only a certain subset of all features
                                                                                                                                         and their corresponding components.
                                                                                                                                                                                                                     component map and Section 4 describes our experience
                                                                                                                                                                                                                     with this technique in a case study. Section 5 discusses
                                                                                                                                                                                                                     related research.

                                                                                                                                                                                                                     2. Concept Analysis
                                                                                                                                                                                                                                                                                                     the intent of c, denoted by intent(c).
                                                                                                                                                                                                                                                                                                        Informally, a concept corresponds to a maximal rectan-
                                                                                                                                                                                                                                                                                                     gle of filled table cells modulo row and column permuta-
                                                                                                                                                                                                                                                                                                     tions. For example, Table 2 contains the concepts for the
                                                                                                                                                                                                                                                                                                     relation in Table 1.
                                                                                                                                                                                                                                                                                                                                                                                C3        C4

                                                                                                                                                                                                                                                                                                                                                                                         C7
                                                                                                                                                                                                                                                                                                                                                                                               C1
                                                                                                                                                                                                                                                                                                                                                                                                      C2


                                                                                                                                                                                                                                                                                                                                                                                                      C6
                                                                                                                                                                                                                                                                                                                                                                                                           C5

                                                                                                                                                                                                                                                                                                                                                                                                                <
                                                                                                                                                                                                                                                                                                                                                                                                                                                  1. cohesive modules and subsystems as defined and doc-
                                                                                                                                                                                                                                                                                                                                                                                                                                                     umented by the system’s architects or re-gained by re-
                                                                                                                                                                                                                                                                                                                                                                                                                                                     engineers; modules and subsystems will be consid-
                                                                                                                                                                                                                                                                                                                                                                                                                                                     ered composite components in the following;
                                                                                                                                                                                                                                                                                                                                                                                                                                                  2. physical modules, i.e., modules as defined by means
                                                                                                                                                                                                                                                                                                                                                                                                                                                     of the underlying programming language or simply
nario or an invoked feature, respectively. If composite
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     components are used for concept analysis, the execution
trace containing the required low-level components
induces an execution trace for composite components by
replacing each low-level component with the composite
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     component to which it belongs. Hence, each system run
element.
• A feature, f, is specific to exactly one component, c, if
c is the only component on all paths from µ(f) to the
bottom element (i.e, c is the only component required
to implement feature f).
• Features, to which two components, c1 and c2, jointly
because the relationship was derived only from a specific
implementation.
he information described above can be derived by a
tool and fed back to the product family expert. As soon as
a decision is made re-use certain features, all components
required for these features (easily derived from the con-
pw_arcbox

draw_arcbox

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   arcbox_drawing_selected




boxsize_msg

resizing_box

elastic_box
erase_box_lengths

init_box_drawing

box_drawing_selected




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             draw−rectangle.mon

draw−polyline.mon

draw−polygone.mon




set_latestline

redisplay_line

last_line

add_line





list_add_line

create_lineobject

line_drawing_selected




free_points
resizing_poly

elastic_poly

regpoly_drawing_selected


set_latestspline

draw_spline

redisplay_spline

last_spline



                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    3
add_spline

list_add_spline

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           create_spline

create_sfactor

spline_bound

spline_drawing_selected

make_sfactors




elastic_moveline




cancel_line_drawing
compute_angle
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        set_latestarc

redisplay_arc

last_arc

add_arc

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        list_add_arc




draw_arc

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        create_arc

arc_bound

compute_direction

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        compute_arccenter




arc_drawing_selected
otDrawString

draw_shift_mousefun_canvas

clear_mousefun_kbd

draw_mousefun_kbd

check_cancel

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               textsize

pw_text

lookfont

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               set_latesttext

in_text_bound

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               text_search




last_text

add_text

redisplay_text

toggle_textmarker




list_add_text

x_fontnum

draw_text

new_string

create_text

resizing_ebr

elastic_ebr

ellipsebyradius_drawing_selected





                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       39

resizing_cbd

elastic_cbd

circlebydiameter_drawing_selected





                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  pw_curve

set_latestellipse

redisplay_ellipse

center_marker

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  last_ellipse

add_ellipse

list_add_ellipse

resizing_ebd

elastic_ebd

ellipsebydiameter_drawing_selected        43
esizing_cbr

elastic_cbr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              circlebyradius_drawing_selected
component c for which γ(c) = C holds). 21 of the concepts
do not introduce any new component and merely merge
functionality needed by several superconcepts.
he first interesting observation is that concepts with
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       many components can be found in the upper region, while
in the lower region, the number of components decreases
names of the features correspond to the objects drawn via
the panel in Figure 5; e.g., draw-ellipse-radius means that
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              an ellipse was drawn where the radius was specified (as
opposed to the diameter).
resulting lattice contained 55 concepts, most of them intro-
duce no new component. We observed that the related
shapes, i.e., the variants of splines, circles, ellipses, etc.,
were merged at the top of the lattice since they use almost
the same components. In order to reduce the size of the
lattice, we selected one representative among the related
to cause interferences by invoking irrelevant features. For
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  instance, Xfig uses a balloon help facility that pops up a
little window when the cursor stays some time on a sensi-
tive area of the GUI (e.g., over the button selecting the cir-
cle drawing mode). Sometimes the balloon help
mechanism triggers, introducing interferences between
our terminology – a set of usage scenarios) is identi-
fied that will invoke a feature.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     2. The excluding input set E is identified that will not
invoke a feature.
he program is executed twice using I and E sepa-
rately.
are required to implement a particular feature and is
needed at an early stage within a process toward a product
family platform
• to weigh alternative platform architectures,
• to aim further tasks – like quality assessment – to only
those existing components that are needed to populate
to get an execution trace for each feature. A more sophisti-
cated environment would allow to start and end recording
traces at any time.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Our implementation only counts subprogram calls and
ignores accesses to global variables and single statements
or expressions. It might be useful to analyze at a finer
ay 1999
randenburg, F.J., ‘Graphlet’, Universität Passau,
http://www.infosun.fmi.uni-passau.de/Graphlet/.
anfora, G., Cimitile, A., De Lucia, A., and Di Lucca, G.A.,
‘A Case Study of Applying an Eclectic Approach to Identify
bjects in Code’, Workshop on Program Comprehension, pp.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       We subm tted the new paper to CSM and t rece ved the best paper
                                                                                                                                                                                                                                                                                                                        ({o1, o2, o3, o4}, ∅)
draw_line                                                                                                                                                                                                                                                                                                 create_ellipse
text_bound



                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        136-143, Pittsburgh, 1999, IEEE Computer Society Press.
get_intermediatepoint
ellipse_bound




                                                                                                                                                                                                                        Concept analysis is a mathematical technique that pro-                               C1
create_line                                                                                                                                                                                            erase_char_string
init_trace_drawing




yields all required components for a single scenario that                                                                            cept lattice) form a starting point for further analyses to                                                                                                                                                                                                                              7                                                                                                                                                                                                                                                                   and the number of interferences increases (an interference                                                                           shapes and re-run the experiment with three shapes                    features. Such effects affect the analysis because they                                                                                                                                                  granularity when subprograms are interleaved, i.e., differ-
line_bound                                                                                                                                                                                                                                                                                                draw_ellipse
char_handler




                                                                                                                                      4. The selected components are more closely analyzed,                                                                                                                                                                                                                                                          directly available as existing files (the distinction to
draw_char_string




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        contribute, can be identified by γ(c1) ∧ γ(c2); graphi-




award
                                                                                                                                                                                                                                                                                                                                                                                       Figure 2. Concept lattice for Table 1.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               unconstrained_line
finish_text_input

text_drawing_selected



y comparison of the two resulting execution traces,                 the platform architecture,
                                                                                                                                                                                                                     vides insights into binary relations. The mathematical                                  C2         ({o2, o3, o4}, {a3, a4})                                                                                                                                                                     exploits one feature. Thus, a single column in the relation                                                                          investigate quality (like maintainability, extractability, and
elastic_line





mouse_balloon

print_to_file

leads to an unstructured lattice; a lattice is said to be struc-                                                                     (ellipse, polygon, and open approximated spline). The                 introduce spurious connections between features. Fortu-                                                                                                                                                  ent strands of control with different functionality are                  [5] Chen, K. und Rajlich, V., ‘Case Study of Feature Location
                           Abstract                                tecture.                                                              for instance, with respect to maintainability, extract-                                                                                                                                                                               tion can be visualized in a more readable equivalent way              cohesive modules is that one does not know a priori                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       XRotDrawAlignedImageString




sing Dependence Graph’, Proc. of the 8th Int. Workshop on
the components can be identified that implement the                 • and to decide on further steps, like reengineering or
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   erase_lengths




                                                                                                                                                                                                                     foundation of concept analysis was laid by Birkhoff in                                                                                                                                                                                                                                          table can be obtained per system run. Applying all usage           cally depicted, one ascertains in the lattice the closest         integrability) and to estimate effort for subsequent steps                                                                                                                                                                                                                                                                                                                                                                                                    see Figure 6                                                                                   tured if it can be decomposed into independent sublattices                                                                           resulting lattice is shown in Figure 7.                               nately, this problem can be partly fixed by providing a spe-                                                                                                                                              united in a single subprogram, possibly for efficiency rea-
                                                                                                                                         ability, and integrability.                                                                                                                                                                                                           by marking only the graph node with an attribute a ∈ A                whether physical modules really group cohesive dec-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             append_point




                                                                       One important piece of information for a product famrogram Comprehension, pp. 241-249, June 10-11, 2000,
create_point
clip_arrows




                                                                                                                                                                                                                     1940. It has already been successfully used in other fields                              C3         ({o1}, {a1, a2})                                                                                                                                                                                                                                                common node toward the top element starting at thefeature.                                                             wrapping.
   Feature component maps describe which components                                                                                                                                                                                                                                                                                                                            whose represented concept is the most general concept                 larations; physical modules are the unscrutinized               scenarios provides the relation table.                                                                                               (wrapping, reengineering, or re-development from                                                                  node                                                                                                                   altlength_msg

concept           that are connected via the top and bottom elements only).                                                                                                                                                  cific scenario in which only the accidentally invoked                                                                                                                                                     sons. For instance, we have found a subprogram in our                        Limerick, Ireland, IEEE Computer Society Press.
                                                                   ily analysis that tries to integrate existing assets is the so-    5. A product family platform is designed. Alternatives                         of software engineering. The binary relation in our specific                                                                                                                                                                                                                                                                                                        nodes to which c1 and c2, respectively, are attached;


{arrow_bound}                mode_balloon                            node
length_msg

draw_mousefun_topruler




are needed to implement a particular feature and are used                                                                                                                                                                                                                                                    C4         ({o2, o4}, {a3, a4, a5})                                                                                                     result of a programmer’s way of grouping declara-                  An execution trace can be recorded by a profiler. How-                                                                             scratch).


{create_mouse

hat is to say that there are many specific operations and                                                                                                                                                  irrelevant feature is invoked, which leads to a refactored            Wilde and Scully focus on localizing rather than deriv-               The technique presented in this paper yields the feature     case study that draws different kinds of objects. The func-
                                                                   called feature component map that describes which com-                for components to populate the product family plat-                         application of concept analysis to derive the feature com-                                                                                                that has a in its intent. Analogously, a node will be marked                                                                                                                                                                                                                                                                                                                                                                                                      cmd_balloon}




he taller a concept is, the moreraudejus, H., Implementing a Concept Analysis Tool for
early in processes to develop a product family based on                                                                                                                                                                                                                                                                                                                                                                                                                                                              ever, most profilers only record subprogram calls but not           all features at and above this common node are thosefew shared operations and also that shared operations are                                                                                                                                                  concept lattice that contains a new concept that isolates the     ing required components: For deriving all required compo-             component map automatically using the execution traces           tion contained a large switch statement whose branches
                                                                                                                                                                                                                                                                                                                                                                               with an object o ∈ O if it represents the most special con-           tions whether it makes sense or not);
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           {setup_ind_panel




                                                                   ponents are needed to implement a particular feature. A               form are weighed: component extraction and reengi-                                                                                                                  C5         ({o3, o4}, {a3, a4, a6, a7, a8})                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   set_line_stuff

components it containsdentifying Abstract Data Types in C Code, master thesis,
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          3.3. Implementation
set_cursor




existing components. This paper describes a new tech-                                                                                                                                                                ponent map states which components are required when a                                                                                                                                                                                                                                          accesses to variables. Instead of using a symbolic debug-          jointly implemented by these components.
create_bitmaps


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Figure 4. Lattice for the first experiment                                                                                                                                                                                                                                                                    really used for many features.                                                                                                                                                                             irrelevant feature and its components. In our example,            nents, the execution trace for the including input set is             for different usage scenarios. The technique is based on         drew the specific kinds of objects. In the execution trace,
                                                                                                                                                                                                                                                                                                                                                                               cept that has o in its extent. The unique element µ in theniversity of Kaiserslautern, Germany, 1998.
process_pending

redisplay_zoomed_region




                                                                   feature is a realized (functional as well as non-functional)          neering, new development, integration of COTS, or                           feature is invoked. This section describes concept analysis                             C6         ({o4}, {a3, a4, a5, a6, a7, a8})                                                                                           3. subprograms, i.e., functions and procedures, and glo-                                                                                                                                                                                                                                                                                                                                ...

main

nique to derive the feature component map and additional                                                                                                                                                                                                                                                                                                                       concept lattice marked with a is therefore:                                                                                           ger, for example, that allows to set watchpoints on variable     • Components jointly required for two features, f1 andoncept #1 in Figure 4 is the largest concept (exclud-                                                                                       1
interferences due to an accidentally invoked irrelevant fea-      sufficient. By subtracting all components in the execution             concept analysis, a mathematical sound technique to ana-         this subprogram showed up for all objects where in fact                  [7] Lindig, C. and Snelting, G., ‘Assessing Modular Structure of
                                                                   requirement (the term feature is intentionally weakly                 wrapping.                                                                   in more detail.                                                                                                                                                                                                                  bal variables of the system; subprograms and global                                                                                                                                                    The implementation of the described approach is sur-             firstly present a general overview of the results and sec-                                                                                                                                                                                                                                     scenarios. To identify all subprograms required for a sin-
                                                                                                                                                                                                                                                                                                                        (∅, {a1, a2, a3, a4, a5, a6, a7, a8})                                                                                                                                                        accesses, or even to instrument the code if no sophisticateding the bottom element). It exploits a single feature “draw                                                                                                                                                ture appeared only at the two layers directly on top of the       trace for the excluding input set from those in the execu-            lyze binary relations, which has the additional benefits to       only specific parts of it were actually executed.
                                                                                                                                                                                                                                                                                                                                                                                                    ∨ { c ∈ L(C ) a ∈ intent ( c ) }
dependencies utilizing dynamic information and conceptf2, are described by µ(f1) ∨ µ(f2); graphicallyegacy Code Based on Mathematical Concept Analysis’,
                                                                   defined because its exact meaning depends on the specific            6. A migration plan is prepared.                                                  Concept analysis is based on a relation R between a set                                                                                                                                                                       variables will be called low-level components in the                                                                                                                                                prisingly simple (if one already has a tool for concept
analysis. The method is simple to apply, cost-effective,                                                                                                                                                                                                                                                                                                                                  µ(a) =                                           (1)                                                                       profiler is available, one can also use a simple static                                                                                                                                                ondly go into further details for particular interesting                                                                                                                                                                                                                                         gle feature or a set of features, one can then analyze the                                                                                                                                 text object”. According to the lattice, the feature is largely                                                                                                                                             bottom element of the lattice, and could be more or less          tion trace for the invoking input set, only those compo-              reveal not only correspondences between features and                Furthermore, the success of the described approach                        Proc. of the Int. Conference on Software Engineering, pp.
                                                                   context). Components are computational units of a soft-                                                                                                                                                                                           Table 2: Concepts for Table 1.                                                                                                   following.                                                                                                                        depicted, one ascertains in the lattice the closest com-
largely language independent, and can yield results                                                                                     The technique described in this article is used to derive                    of objects O and a set of attributes A, hence R ⊆ O × A.                                                                                                     The unique element γ marked with object o is:                                                                                      dependency analysis: One considers all variables directly
analysis). Our prototype for a Unix environment is an            observations.                                                                                                                                                                                                                                                                                    concept lattice as described in Section 3.2.                                                                                                                                               independent from other features and shares only a few                                                                                                                                                      ignored.                                                          nents remain that specifically deal with the feature.                  components but also dependencies between features and            heavily depends on the clever choice of usage scenarios                      349-359, Boston, 1997.
                                                                   ware architecture (see Section 3.1). Because the feature                                                                                              The tuple C = (O, A, R) is called formal context. For a                        The set of all concepts of a given formal context forms                                                                                     Ideally, one will use alternative (1) when reliable and                                                                             mon node toward the bottom element starting at the                opportunistic integration of the following parts:
                                                                                                                                                                                                                                                                                                                                                                                                    ∧ { c ∈ L(C ) o ∈ extent ( c ) }
quickly and very early in the process.                                                                                               the feature component map which plays a central role                                                                                                                                                                                                                                                                                                                            and statically accessed for each executed subprogram also                                                                                                                                                Xfig is a menu-driven tool that allows the user to draw                                                                                                                                                                                                                                       First experiment. In our first experiment, we prepared 15                                                                                                                                   components with other features.                                                                                                                                                                                                                                                  Note that our technique achieves the same effect by               between components (feature-feature dependencies are             and the combination of them. Scenarios that cover too                    [8] Lindig, C., Concepts,
                                                                   component map is needed very early to trade off alterna-                                                                                          set of objects, O ⊆ O, the set of common attributes, σ, is                      a partial order via:                                                                 γ (o) =                                          (2)   complete documentation exists. However, if cohesive                                                                                    nodes to which f1 and f2, respectively, are attached; all          • Gnu C compiler gcc to compile the system using a                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ftp://ftp.ips.cs.tu-bs.de/pub/local/softech/misc.
                                                                                                                                     early in this process.                                                                                                                                                                                                                                                                                                                                                          to be dynamically accessed (all transitively accessed vari-                                                                                                                                           and manipulate objects interactively under the X Window                                                                                                                                                                                                                                          scenarios. Each scenario invokes Xfig, performs the draw-                                                                                                                                       Concept #5 represents the two features “draw polyline”                                                                                                 4
elated Research                                               considering several execution traces for different features           derived from an existing system and, hence, may only             much functionality in one step or the clumsy combination
                                                                   tives in good time, complete and hence time-consuming                                                                                             defined as:                                                                                                                                                                                                                  modules and subsystems are not known in advance, one                                                                                   components at and below this common node are those                   command line switch for generating profiling infor-
1. Introduction                                                    reverse engineering of the system is out of the question. In      Overview. The technique described here is based on the                                                                                                                 ( O 1, A 1 ) ≤ ( O 2, A 2 ) ⇔ O 1 ⊆ O 2 or equivalently with          We will call a graph representing a concept lattice                                                                                ables will automatically be considered because all exe-                                                                                                                                               System. Objects can be lines, polygons, circles, rectangles,                                                                                                                                                                                                                                     ing of one of the objects Xfig provides, and then termi-                                                                                                                                    and “draw polygon”. The only difference between these                                                                                                                                                                                                                        at a time. Components not specific to a feature will “sink”            exist for this particular system but not necessarily for these   of scenarios will result in huge and complex lattices that               [9] Koschke, R., ‘Atomic Architectural Component Recovery for
                                                                                                                                                                                                                                                                                                                                                                                                                                                 would hardly make the effort to analyze a large system to                                                                              jointly required for these features.                                 mationhe mathematical foundation of concept analysis was                                                                                                                                                                                                                               Program Understanding and Evolution’, Dissertation, Institut
                                                                                                                                                                                                                             σ ( O ) = { a ∈ A ∀( o ∈ O ) ( o, a ) ∈ R }                                                                                                       using this marking strategy a sparse representation. The                                                                              cuted subprograms are examined). In practice, this                                                                                                                                                    splines, text, and imported pictures. An interesting first                                                                                                                                                                                                                                       nates Xfig, i.e., the aspects above were not combined and                                                                                                                                   two features is that an additional line is drawn that closes a                                                                                                                                                                                                               in the concept lattice, i.e., will be closer to the bottom ele-       features in general).                                            are unreadable for humans. Moreover, the number of
                                                                   particular, the decision for a certain alternative will lead to   execution traces generated by a profiler for different usage                                                                                                            ( O 1, A 1 ) ≤ ( O 2, A 2 ) ⇔ A 1 ⊇ A 2 .                                                                                            obtain these in order to apply concept analysis to get thelaid by Birkhoff in 1940. Primarily Snelting has recently                                                                                                                                                                                                                             für Informatik, Universität Stuttgart, 2000,
    Developing similar products as product families prom-                                                                                                                                                                                                                                                                                                                      equivalent sparse representation for Figure 2 is shown in                                                                             analysis may be a sufficient approximation. But one               • Components required for all features can be found at               • Gnu object code viewer nm and a short Perl script in          task in our case study was to define what constitutes a fea-                                                                                                                                                                                                                                                                                                                                                                                                                                polygon. This difference is not visible in the concept lat-                                                                                                                                                                                                                  ment. More precisely, recall from Section 3.2 that a com-                 The technique is primarily suited for functional fea-        usage scenarios increases tremendously when features are
                                                                   a consolidation on specific economically important core            scenarios (see Figure 1). One scenario represents the invo-                         Analogously, the set of common objects, τ, for a set of                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            no other functionality of Xfig was used. We used allhttp://www.informatik.uni-stuttgart.de/ifi/ps/rainer/thesis.
ises several advantages over relatively expensive separate                                                                                                                                                                                                                                              If c1 ≤ c2 holds, then c1 is called a subconcept of c2                 Figure 3. The content of a node N in this representation          feature component map because it not yet clear which                should be aware that it may overestimate references                the bottom element.                                                  order to identify all functions of the system (as             ture. Clearly, the capability to draw specific objects, like                                                                                                                                                                                                                                     shapes of Xfig’s drawing panel shown in Figure 5 except                                                                                                                                     tice since the two features are attached to the same con-                                                                                                                                                  introduced concept analysis to software engineering. Since        ponent, c, is specific to exactly one feature, f, if f is the          tures that may be mapped to components. In particular            combined.
                                                                   components in many cases and hence to an exclusion of             cation of one single feature and yields all subprograms                         attributes, A⊆ A, is defined asrone, M. and Snelting, G., ‘On the Inference of Configura-
developments, like lesser costs and shorter time for devel-
                                                                                                                                     executed for this feature. These subprograms identify the                                                                                                       and c2 is called superconcept of c1. For instance,                        can be derived as follows:                                        components are relevant at all and reverse engineering of           because variable accesses may be included that are on            • Features that require all components can be found at                 opposed to those included from standard libraries),           lines, splines, rectangles, etc., can be considered a feature                                                                                                                                                                                                                                    picture objects and library objects.                                                                                                                                                       cept. The distinction is made in the body of the function                                                                                                                                                  then it has been used to evaluate class hierarchies [15],         only feature on all paths from γ(c) to the top element.               non-functional features do not easily map to components.            In our case study, the method provided us with valuable
opment, test, and maintenance. These advantages are                less important components. Any investment in a deep and                                                                                                   τ ( A ) = { o ∈ O ∀( a ∈ A ) ( o, a ) ∈ R }                                                                                                                                                                         the complete system first will likely not be cost-effective.                                                                                                                                                                                                               of Xfig. Moreover, one can manipulate drawn objects in                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 explore configuration structures of preprocessor state-                                                                                                                                                                                                                                tion Structures From Source Code’, Proc. of the Int. Confer-
                                                                                                                                                                                                                                                                                                     ({o2, o4}, {a3, a4, a5}) ≤ ({o2, o3, o4}, {a3, a4}) is true in                                                                                                                                                  paths not executed at runtime, and it will also ignore refer-      the top elementthat is called to draw either a polygon or a polyline. Con-                                                                                                                                                                                                                      One may argue that components that are only required              For example, for applications for which timing is critical       insights. The lattice revealed dependencies among features




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       So young researchers never g ve up!
                                                                   costly pre-analysis of less important components would be         components (or are themselves considered components)                                                                                                                                                                                       • the objects of N are all objects at and below N,                                                                                                                                                                                                                         • Gnu profiler gprof and a short Perl script to ascertainence on Software Engineering, pp. 49-57, May 1994, IEEE
based on the fact that all family members share a common                                                                                                                                                                In Section 3.1, the formal context for applying concept                                                                                                                                                                  Only later, if the retrieved feature component map (using           ences to variables by means of aliases if the simple static                                                                                                                                           different edit modes (rotate, move, copy, scale, etc.) with                                                                                                                                                                                                                                                                                                                                                                                                                                 cept #3 denotes the feature “draw spline”. Concept #4 has                                                                                                                                                  ments [10, 14], and to recover components [4,7,12,13].            to get the system started, but are not – strictly speaking –          (because it may result in diverging behavior), the features      for the Xfig implementation and the absence of such                           Computer Society Press.
                                                                   in vain to a large degree. Instead, reverse engineering in        required for a certain feature. The required components for                                                                                                     Table 2.                                                                   • the attributes of N are all attributes at and above N.                                                                                                                                              • If the top element does not contain features, then all               the executed functions in the execution trace,
infrastructure – also known as platform architecture. There                                                                                                                                                          analysis to derive the feature component map will be laid                                                                                                                                                                   simpler definitions of components, like those in (2) or (3))         dependency analysis does not take aliasing into account.                                                                                                                                              Xfig. Hence, we considered as main features the following                                                                                                                                                                                                                                            circle by radius                                                                                                                                             circle by diameter         no feature attached and represents the components shared                                                                                                                                                       For feature localization, Chen and Rajlich [5] propose a      directly necessary for any feature will still appear in the           would also have to take time into account.                       dependencies, respectively; e.g., the abilities to draw text
                                                                   early phases should give information on the feature com-          all scenarios and the set of features are then subject to con-                                                                                                     The set of all concepts of a given formal context and                                                                                                                                                                                                                           components in the top element are superfluous (such                 • concept analysis tool concepts [8],                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    [11]Perry, D., ‘Generic Architecture Descriptions for Product
are many approaches to newly developing product families                                                                                                                                                             down as follows;                                                                                                                                             For instance, the node in Figure 3 marked with o2 and          clearly shows which lower-level components should be                For a first analysis to obtain a simplified feature compo-                                                                                                                                              two capabilities:                                                                                                                                                                                                                                                                                                                                                                                                                                                                           for drawing polygons, polylines, and splines. These com-

semi-automatic method, in which an analyst browses the            concept lattice when we do not subtract execution traces                  Note also that the technique is not suited for features      and circles/ellipses are widely independent from other
                                                                   ponent map quickly and with simple means. To this end,            cept analysis. Concept analysis gives information on rela-                                                                                                      the partial order ≤ form a complete lattice, called concept                                                                                 investigated further to obtain composite components,                                                                                   components will not exist when the set of objects for                                                                                                                                                                                                                                                                                                                                                                                  ellipse by radii                                                                                                                                            ellipse by diameters                                                                                    Figure 6. Relevant parts for                                                                                                                                                                                                                                                                                                                                                                                         Lines’, Proc. of the Second International ESPRIT ARES
from scratch [2, 11]. However, according to Martinez [16],                                                                                                                                                            • components will be considered objects,                                                                                                                 a5 is the concept ({o2, o4}, {a3, a4, a5}).                                                                                           nent map, one can also ignore variables and come back to                                                                              • graph editor Graphlet [3] to visualize the concept lat-        1. ability to draw different shapes (lines, curves, rectan-                                                                                                                                                                                                                                                                                                                                                                                                                                ponents are no real drawing operations but operations to                        circles and ellipses                                                                                                       statically derived dependency graph; navigation on that           for an excluding input set. It is true that these components          that are only internally visible, like whether a compiler        shapes. Related features were grouped together in the con-                   Workshop, Lecture Notes in Computer Science 1429, pp. 51-
                                                                   the product line analyst imparts all relevant features, for       tionships between features and required components as                                                                                                           lattice L:                                                                                                                                  reverse engineering may generally pay off (in order to                                                                                 concept analysis contains only components executed                   tice,                                                                                                                                                                                                                                                                                                                                                          closed approx. spline                                                                                                                                       approximated spline                                                                                                                                                    Figure 7. Concept lattice for second experiment.
graph is computer-aided. Since the analyst more or less                                                                                                                                                                                                                               56, Springer, 1998
most successful examples of product families at Motorola                                                                                                                                                                                                                                                                                                                                            a3, a4                                                                                                           these in a later phase using more sophisticated dynamic or                                                                                                                                                gles, etc.)                                                                                                                                                                                                                                                                                                                                                                                                                                                                             keep a log of the points set by the user and to draw lines                 Nodes #41, #42, #43, and #44 represent the features to                                                                                                                                            cannot be distinguished from components that in fact con-             uses a certain intermediate representation. Strictly speak-      cept lattice, which allowed us to compare our mental
                                                                   which the necessary components need to be detected, to            well as feature-feature and component-component depen-                           • features will be considered attributes,                                                                      O     A                                                                                                     detect cohesive modules, we have developed a semi-auto-                                                                                at least once, which is the case if a filter ignores allhis lattice consists of 22 concepts, three of them pro-          takes on all the search, this method is less suited to quickly
originated in a single separate product. Only in the course
                                                                                                                                     dencies.                                                                                                                                                               L(C) = { ( O, A ) ∈ 2 × 2          A = σ(O) ∧ O = τ( A)}            a1, a2                                                                                                                               static analyses.                                                   subprograms for which the profiler reports an execu-                • and two more short Perl scripts to convert the file for-                                                                                                                                                                                                                                                                                                        closed interpol. spline                                                                                                                                             interpolated spline    between set points while the user is still setting points (a           draw circles and ellipses using either diameter or radius.                                                                                                                                            tribute to all components because both kinds of compo-                ing, internal features may be viewed as implementation           model of a drawing tool to the actual implementation of                  [12]Sahraoui, H., Melo. W, Lounis, H., and Dumont, F. (1997),
                                                                   the reverse engineer who in turn delivers the feature com-                                                                                         • a pair (component c, feature f) is in relation R if c is                                                                                                              a5              a6, a7, a8                         matic method integrating many automatic state-of-the-art                                                                                                                                                                                                                   2. ability to modify shapes in different editing modesvide the specific functionality for the respective shapes.             and cheaply derive the feature component map. Moreover,                                                                                                                                                                                                                               ‘Applying Concept Formation Methods to Object Identifica-
of time, a shared architecture for a product family evolvedmats of concepts and Graphlet (all Perl scripts                                                                                                                                                                                                                                                                                                                                           polygon                                                                                                                                        polyline         spline first appears as polygon and is only re-shaped when              They all contain three specific components to draw the                                                                                                                                                 nents jointly appear at the bottom element. However, the              details. However, such implementation details may be of          Xfig. The lattice also classified components according to
                                                                   ponent map. On the basis of the feature component map                                                                                                executed when f is invoked.                                                     The infimum of two concepts in this lattice is com-                        o1          o2              o3                                 techniques [9]).                                                                                                                       tion count of 0).                                                                                                                      (rotate, move, copy, scale, etconcept #1 (21 functions) depicts the functionality for                                                                                                                                                                                                                                                                                                     tion in Procedural Code’, Proc. of the Conference on Auto-
Moreover, large investments impose a reluctance against                                                                               feature F                                                                                                                                                                                                                                                                                                                                                                      3.2. Interpretation of the Concept Lattice                                                                                              together have just 147 LOC).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        rectangular box       the user has set all points).                                          object, to plot an elastic bend while the user is drawing,                                                                          the method relies on the quality of the static dependency         idea of an excluding input set can be taken over to our               interest for defining a product family architecture. Internal     their abstraction level, which is a useful information for
                                                                   and additional economic reasons, a decision is made for                                                                                                                                                                           puted by intersecting their extents as follows:                                                     orectangular box                                                                                                                                                                                                                                                                                                                splines and concept #2 (17 functions) represents the one                                                                                                                                                                                                                                                                                                    mated Software Engineering, Nevada, pp. 210-218,
introducing a product family approach that ignores exist-                                                                                  usage scenario                                                                However, here – for the time being – we will use as an                                                                                                                                       <                             Alternative (2) can be chosen if suitable documentation                                                                           • If the bottom element does not contain any compo-
he fact that the subprograms are extracted from the
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               We conducted two experiments. In the first one, we                                                                                                                                                                                                                                                                                                                                                                                                                with rounded corners       Concept #2 stands for the feature “draw arc” and con-              and to resize the object. Note the similarity of the compo-                                                                         graph. If this graph, for example, does not contain infor-        technique to distinguish these two kinds of components by             features can only be detected by looking at the source,          re-use; general components can be found at the lower                         November, IEEE Computer Society.
ing assets. Hence, an introduction of a product family             particularly interesting and required components, and fur-
                                                                                                                                                                                                                     abstract example the binary relation between arbitrary                                 ( O 1, A 1 ) ∧ ( O 2, A 2 ) = ( O 1 ∩ O 2, σ ( O 1 ∩ O 2 ) )                                                                         is not available but there is reason to trust the program-             Concept analysis applied to the formal context                  nent, all features in the bottom element are not imple-                                                                            investigated the ability to draw different shapes only. In                                                                                                                                                                                                                                         regular polygon                                                                                                                                                            arc
for lines (used for polygons). Both are dependent on con-             mation on potential values of function pointers, the human
                                                                                                                                                execution trace                                                                                                                                                                                                                   Figure 3. Sparse representation of Figure 2.                                                                                                                                                                                                                            object code makes the implementation independent from                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        cept #7 is again a concept that represents shared compo-               nent names. The specific commonalities among circles and                                                                                                                                               providing a usage scenario in which no feature is invoked,            because it is not clear how to invoke them from outside          level, specific components at the upper level. Moreover,
                                                                   ther expensive analyses regarding quality can be cost-                                                                                                                                                                                                                                                                                                                        mers of the system to a great extent. In all other cases, one       described in the last section gives a lattice, from which          mented by the system (this constellation will not                                                                                  the second one, we analyzed the ability to modify shapes.                                                                                                                                                                                                                                           picture object                                                                                                                                                           text                                                                                                                                                cept #4 (29 functions) that groups functions related to               analyst may miss functions only called via function point-                                                                                                                                                                                                                        [13]Siff, M. and Reps, T., ‘Identifying Modules via Concept
approach has generally to cope with existing code.                                                                                                 required components C1 …Cn                                        objects and attributes shown in Table 1. An object oi has                         The infimum describes a set of common attributes of                                                                                                                                                                                                                                                                                                 the programming language to a great extent (as long as the                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   nents for drawing elastic lines while the user is setting              ellipses are represented by node #38, which introduces the                                                                                                                                            like simply starting and immediately shutting down the                and how to derive from an execution trace whether these          the lattice showed dependencies among components,                            Analysis’, Proc. of the Int. Conference on Software Mainte-
                                                                   effectively aimed at selected components.                                                                                                                                                                                                                                                                                                                                     will fall back on alternative (3). However, for alternative         interesting relationships can be derived. These relation-          exist, if there is a usage scenario for each feature and                                                                           The second experiment exemplifies combined features                                                                                                                                                                                                                                                                                                                                                                                                                    library object
points. Concept #3 (20 functions) denotes the ellipse fea-            ers. At the other extreme, if the too conservative assump-
    Reverse engineering may help creating a product fam-                                                                                                                                                             attribute aj if row i and column j is marked with an ! in                       two sets of objects. Similarly, the supremum is deterpoints. The difference between concept #7 and concept #4               shared components to draw circles and ellipses (both spec-                                                                                                                                            system without invoking any relevant feature. That simple             features are present or not. However, we assume that             which need to be known when components are to be
                                                                       This paper describes a quickly realizable technique to                           (F, C1), …(F, Cn) ∈R                                                                                                                                                                                                   3. Feature Component Map                                          (3), concept analysis may additionally yield hints on sets          ships can be fully automatically derived and presented to          every usage scenario is appropriate and relevant to the           language is compiled to object code) and has the advan-          composed by basic features. For the second experiment, ature, concept #5 (29 functions) the general drawing sup-                                                                                                                                                                                                                                                                                                    nance, Bari, pp. 170-179, October, 1997, IEEE Computer
ily for existing systems by identifying and analyzing the                                                                                                                                                                                                                                            mined by intersecting the intents:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                is that the former only contains the components to draw                ified by diameter and radius).                                                                                                       tion is made that every function whose address is taken is        trick separates the two kinds of components in two distinct           externally visible features are generally more important.        extracted.                                                                   Society.
                                                                   ascertain the feature component map based on dynamic                                      concept analysis                                        Table 1 (the example stems from Lindig and Snelting [7]).                                                                                                                                                                                                                                       the analyst such that the more complicated theoretical             system; a system may indeed not have all features,                tage that no front end is necessary. On the other hand,                                                                                                                                                                                                                                                                                                                                                      Figure 5. Xfig’s object shapes.                                                                                                                                                                                                                                                       port functionality and concept #6 (123 functions) the start-
components and also by deriving the individual architec-                                                                                                                                                                                                                                                                                                                                                                                         of related subprograms forming composite components.                                                                                                                                                                                                                      shape is drawn and then modified. Both draw and modify                                                                                                                                                                                                                                                                                                                                                                                                                                       the elastic line, while the latter adds the capability to set an           Nodes #32 and #39 connect the circles and ellipses to                                                                           called at each function pointer call site, the search space       concepts, C1 and C2, in the lattice where C1 < C2 and                     The invocation for externally visible features is com-          As future work, we want to explore how results
                                                                   information (gained from execution traces) and concept                                          feature component map                             For instance, the following equations hold for this table,                             ( O 1, A 1 ) ∨ ( O 2, A 2 ) = ( τ( A 1 ∩ A 2), A 1 ∩ A 2 )            In order to derive the feature component map via con-
background can be hidden. The only thing an analyst has            i.e., a usage scenario may be meaningless for a given             because a compiler may replace source names by link                                                                                                                                                                                                                                                                                                                   The resulting lattice for this experiment is shown in                                                                                                                                                                                                                                                                       up and initialization code of the system.                                                                                                                                                                                                                                                                                                               [14]Snelting, G., ‘Reengineering of Configurations Based on
ture from each system. These individual architectures may                                                                                                                                                                                                                                                                                                                                                                                           The relation for the formal context necessary for con-                                                                                                                                                                                                                 constitute a basic feature. Combined features add to the                                                                                                                                                                                                                                                                                                                                                                                                                                    arbitrary number of points. Splines do not need this capa-             the other objects. No components are attached to nodes                                                                              increases extremely. Generally, it is statically undecidable                                                                            paratively simple when a graphical user interface is avail-      obtained by the method described in this paper may be                        Mathematical Concept Analysis’, ACM Transactions on Soft-
                                                                   analysis. The technique is automatic to a great extent.                                         and dependencies                                  also known as relation table:                                                                                                                             cept analysis, one has to define the formal context
to know is how to interpret the derived relationships. This                                                                          names in the object code (for instance, C++ compilers use                                                                                                                                                                                                                                                                                                         Figure 4. The contents of the concepts in the lattice are                                                                                                                                                                                                                                                                           Analyzing concepts #1, #2, and #3, we found that the                                                                                C1= ⊥ and C2 contains only those components that are
then be unified to a platform architecture and the derived                                                                                                                                                                                                                                                 The supremum ascertains the set of common objects,                                                                                     cept analysis is defined as follows:                                                                                                    system).                                                                                                                           effort needed to derive the feature component map as there                                                                                                                                                                                                                                                                                                                                                                                                                                  bility because they are defined by exactly three points.                #32 and #39, they only merge components from different                                                                              which paths are taken at runtime, so that every static anal-                                                                            able (as it was the case in our case study). Then, usually       combined with results of additional static analyses. For                     ware Engineering and Methodology 5, 2, pp. 146-189, April,
                                                                   Concept analysis is a mathematical technique to investi-                                                                                                     σ ( { o 1 } ) = { a 1, a 2 } and τ ( { a 7, a 8 } ) = { o 3, o 4 }
                                                                                                                                                                                                                                                                                                                                                                               (objects, attributes, relation) and to interpret the resulting                                                                                                                                                                                                             name mangling to resolve overloading) there is not always                                                                                                                                                                                                                                                                                                         omitted for readability reasons. However, their size in this                                                                                                                                                                                                                                                                    shapes provide individual rotate functions. In other words,                                                                             really required for all components in a narrower sense.
components may be used to populate the unified architec-                                                                                                   Figure 1. Overview.                                                                                                                        which share all attributes in the intersection of two sets of                                                                                       (C, F) ∈ R if and only if component C is required           section explains how interesting relationships can be auto-        Beyond these relationships between components and                                                                                  are many possible combinationsysis will yield an overestimated search space, whereas                                                                                                                                                                                                                                1997.
                                                                                                                                                                                                                                                                                                                                                                               concept lattice accordingly.                                                                                                                                                                                                                                               a direct mapping from the subprograms in the execution                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Concept #6 represents the feature “draw lines” and is              concepts. The two nodes have a direct infimum (not shown                                                                                                                                                   Furthermore, our technique goes beyond Wilde and                  only a menu selection or a similar interaction is necessary.     example, we want to investigate the relation between the
                                                                   gate binary relations (see Section 2).                                                                                                                                                                                            attributes.                                                                                                                                                                                                     matically derived.                                                                                                                                                                                        In both experiments, we considered subprograms as                                                                                                                                                                                                                                            picture is a linear function of their number of components                                                                                                                                                                                                                                                                      the rotate feature is implemented specific to each shape,              dynamic analyses exactly tell which parts are really used                                                                                                                                                                                                                         [15]Snelting, G. and Tip, F., ‘Reengineering Class Hierarchies
ture. To this end, code needs to be adjusted, reengineered,                                                                             We want to point out that not all non-functional                                                                                                                                                                                                                                                                 when feature F is invoked; a subprogram is                                                                                  features, further useful aspects between features on one
trace back to the original source. Because we dealt in our                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   used for drawing rectangles, polygons, and polylines, as               in Figure 6) and add the same components to the circle and                                                                                                                                                                                                                  In the case of a batch system, one may vary command line         concept lattice based on dynamic information and static
                                                                                                                                                                                                                                    a1      a2      a3      a4      a5      a6      a7      a8                                                                                                                                                                                                                          As already abstractly described in Section 2, the fol-                                                                                                                                             components. However, in our simple implementation, we                                                                                                                                                                                                                                            (except for the bottom element that contains 136 compo-                                                                                                                                                                                                                                                                         i.e., there is no generic component that draws all different                                                                            Scully’s technique in that it also allows to derive relevant                                                                                                                                                        Using Concept Analysis’, Proc. of the ACM SIGSOFT Sym-
or wrapped. However, changing or wrapping the code is              Integration into a Product Family Process. A simple               requirements, e.g., time constraints, can be easily mapped                                                                                                          Graphically, the concept lattice for the example relation             3.1. Context for Feature and Components                                   required when it needs to be executed; a global                                                                             hand and between components on the other hand may beone would expect. The generality of this feature becomes               ellipse features. The components inherited via these two                                                                            at runtime (though for a particular run only). However,                                                                                 switches and may have to provide different sets of test data     software architecture recovery techniques.
                                                                                                                                                                                                                           o1       !       !                                                                                                                                                                                                                                                                        lowing base relationships can be derived from the sparse                                                                             case study with C code, object code names were identical         do not handle variable accesses. Hence, not all required                                                                                                                                                                                                                                         nents, mostly initialization and GUI code and very basic                                                                                                                                                                                                                                                                        shapes, which would have been an interesting finding in                                                                                  relationships between components and features by means                                                                                                                                                              posium on the Foundations of Software Engineering, pp. 99-
only done in very late phases in moving toward a product           process for feature-based reengineering toward product            to components, i.e., our technique primarily aims at func-                                                                                                      in Table 1 can be represented as a directed acyclic graph                                                                                           variable is required when it is accessed (used or                                                                           derivedimmediately obvious in the concept lattice as it is located            nodes are very basic components of the lowest regions of                                                                            Chen and Rajlich’s technique could be helpful in a later                                                                                to invoke a feature. However, in order to find suitable test
                                                                                                                                                                                                                           o2                       !       !       !                                                                                                                                                                                                                                                representation of the lattice (note the duality in the inter-                                                                        to source names. If this is not the case, one either tolerates                                                                                                                                                                                                                                                                                                    functions, and was too large to be drawn accordingly; as a                                                                                                                                                                                                                                                                      terms of reuse.                                                                                                                         of concept analysis, whereas Wilde and Scully’s technique                                                                                                                                                           110, November, 1994.
family. Reverse engineering can also assist in earlier             families can be described as follows:                             tional features. However, in some cases, it is possible to                                                                                                      whose nodes represent concepts and whose edges denote                        Components will be considered objects of the formal                    changed); a composite component is required when                                                                             • If γ(c1) < γ(c2) holds for two components c1 and c2,
low-level components are detected.                                                                                                                                                                                                                                                                                                                                                                                                                                                          in the middle level of the lattice.                                    the lattice, which indicates that ellipses and circles are                                                                          phase, in which the system needs to be more rigorously                                                                                  data, one might need some knowledge on internal details          References
                                                                                                                                                                                                                           o3                       !       !               !       !       !                                                                                  context, whereas features will be considered attributes.                                                                              pretation):                                                                                                                          divergences between names (mostly, names are similar                 The resulting concepts contain subprograms grouped                                                                                                                                                                                                                                           comparison point: the text drawing concept, marked as                                                                                                                                                                                                                                                                                                                                                                                                                   only localizes a feature. The derived relationships are an                                                                                                                                                      [16]Staudenmayer, N.S. and Perry, D.E., ‘Session 5: Key Tech-
phases and, thus, Bayer et al. rightly demand an early inte-        1. The economically relevant features are ascertained by         isolate non-functional aspects, like security, in code and                                                                                                      the superconcept/subconcept relation < as shown in                                                                                                  one of its parts is required.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     The framed area in Figure 4 has a simpler structure                widely separate from all other objects.                       General observations. We made the experience that                     analyzed. The purpose of our technique is to derive the                                                                                 of a system.
                                                                                                                                                                                                                           o4                       !       !       !       !       !       !                                                                                  Note that in the reverse case, the concept lattice is simply                                                                                                                                             then component c2 requires component c1.                          enough) or has to reverse name mangling.                         together according to their usage for features. Note that the                                                                                                                                                                                                                                    node #1, has 29 components). As Figure 4 shows, there are                                                                                                                                                                                                                                                                                                                                                                                                               import information to product family experts and represent                                                                             [1] Bayer, J., Girard, J.-F., Würthner, M., Apel, M., and DeBaud,            niques and Process Aspects for Product Line Development’,
gration of reverse engineering into a product family                   product family engineers and market analysts.                 map them to specific components. For instance, one could                                                                                                         Figure 2. The most general concept is called the top ele-                                                                                      In order to obtain the relation, a set of usage scenarios         • A component, c, is required for all features at and                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            than the rest of the lattice. This part deals with circles and                                                                       applying our method is easy in principle. However, run-               feature component map. It handles the system as a black                                                                                     The implementation of this technique was surprisingly            J.-M., ‘Transitioning Legacy Assets - a Product Line                     Proc. of the 10th International Software Process Workshop,
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • If µ(f1) < µ(f2) holds for two features f1 and f2, then                                                                                                                                                                                                                                                                                                                                                                             a few concepts containing most of the components (i.e.,                                                                                                                                                                                                           Second experiment. In a second experiment, we analyzed                                                                                                                                                additional dependencies that need to be considered in a
approach [1]. Early reverse engineering is needed to derive                                                                          concentrate all network accesses in one single component                                             Table 1: Example relation.                                 ment and is denoted by . The most special concept is                      inverted but the derived information will be the same.            needs to be prepared where each scenario exploits prefera-             above γ(c) – as defined by (1) – in the lattice.                                                                                                                                                    more general subprograms can be found at the lower con-                                                                                                                                                                                                                                                                                                                                                                                                                                     ellipses and its details are shown in Figure 6. Each node,                                                                           ning all scenarios by hand is time consuming. It may be               box and, hence, does not give insights in internal aspects                                                                              simple. We opportunistically put together a set of publicly          Approach’, Proceedings of the SIGSOFT Foundations of                     June 1996, Ventron FR.
                                                                    2. The feature component map is derived based on the
                                                                                                                                                                                                                                                                                                     called the bottom element and is denoted by ⊥ .                              The set of relevant features will be determined by the                                                                                                                                                                                                                  4. Case Study                                                    cepts in the lattice since they are used for many features,                                                                                                                                                                                                                                      subprograms) of the system. The lattice contains 47 con-                                                                                                                                                                                                          the edit mode rotate which comes in two variants: clock-                                                                                                                                              decision for certain features and components.
first coarse information on existing system components
                                                                       identified relevant features.
                                                                                                                                     to enable controlled secure connections.
                                                                                                                                                                                                                        A pair (O, A) is called concept if A = σ ( O ) ∧ O = τ ( A )                                                                                                                                                             bly only one relevant feature. Then the system is used               • A feature, f, requires all components at and below µ(f)         feature f1 is based on feature f2.
cepts. 26 of them introduce at least one new component,
in Figure 6 contains two sets: The upper set contains all
wise and counterclockwise. The first ten shapes in
facilitated by the presence of test cases that allow an auto-         with respect to quality and effort.                                                                                                     available tools and wrote a few Perl scripts (140 LOC in             Software Engineering, Toulouse, pp. 446-463, Association of          [17]Wilde, N. and Scully, M.C., ‘Software Reconnaissance:
(assets) timely needed by a product family analyst to                                                                                   The remainder of this article is organized as follows.                                                                                                           The combination of the graphical representation in                    product family experts. For components, we can consider                                                                                                                                                                                                                                                                                     while specific components are in the upper region of themated replay of various scenarios.                                        Wilde and Scully [17] also use dynamic analysis to                                                                                                                                                       Computing Machinery (ACM), 1999.
                                                                                                                                                                                                                                                                                                                                                                                                                                                 according to the set of usage scenarios, one at a time, and            – as defined by (2) – in the lattice.                            One has to note that the latter relationship between fea-            As a case study, we analyzed the Xfig system [18]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         components attached to the node, i.e., those components,                                                                                                                                                                                                                                                                                           total) for interoperability, which took us just one day. A                                                                                    Mapping Program Features to Code’, Software Maintenance:
investigate feasibility and to estimate costs of different          3. The previously derived feature component map gives            Section 2 introduces concept analysis. Section 3 explains
                                                                                                                                                                                                                     holds, i.e., all objects share all attributes. For a concept c =
                                                                                                                                                                                                                                                                                                     Figure 2 and the contents of the concepts in Table 2                      the following alternatives depending on how much knowl-                                                                                                                                                                                                                                                                                     lattice. Hence, the concept lattice also reflects the level of                                                                                                                                                                                                                                   i.e., to these nodes, a component is attached (more pre-
c, for which γ(c) = N; the lower set contains all features of
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Figure 5 were drawn and rotated once clockwise and once
ecause Xfig has a GUI, running a single scenario by                localize features as follows:                                     6. Conclusions                                                        drawback of our simple implementation is that one has to         [2] Bosch, J., ‘Product-Line Architectures in Industry: A Case               Research and Practice, vol. 7, pp. 49-62, 1995.
                                                                                                                                                                                                                     (O, A), O is the extent of c, denoted by extent(c), and A is                                                                                                                                                                the execution traces are recorded. An execution trace con-           • A component, c, is specific to exactly one feature, f, if     tures safely holds for the analyzed system only, i.e., this          (version 3.2.1) consisting of about 76 KLOCs written in                                                                                                                                                                                                                                                                                                           cisely, a concept C introduces a component if there exists a                                                                                                                                                                                                      counterclockwise, which resulted in 20 scenarios. The
alternative ways to get to a suitable product family archi-            additional insights into dependencies among features          how concept analysis can be used to derive the feature                                                                                                          together form the concept lattice. The complete informa-                  edge on the system architecture is already available:                                                                                                                                                                                                                                                                                       abstraction of these subprograms within the given set of                                                                                                                                                                                                                                                                                                                                                                                                                                    N, including those inherited from other concepts. The                                                                                hand is an easy task. However, one has to pay attention not                                                                                                                                                   run the system for each usage scenario from the beginning            Study’, Proc. of the 21st International Conference on Soft-          [18]Xfig system, http://www.xfig.org.
                                                                                                                                                                                                                                                                                                                                                                                                                                                 tains all required low-level components for a usage sce-               f is the only feature on all paths from γ(c) to the top      relationship is not necessarily true for the features as such,       the programming language C. In this section, we will                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     1. The invoking input set I (i.e., a set of test cases or – in      A feature component map describes which components                                                                                      ware Engineering (ICSE’99), (Los Angeles, CA, USA), pp.





                                                               1




        Derivation of Feature Component Maps by means of Concept Analysis                                                            grams a set of components C. A component corresponds to                          3.2. Interpretation of the Concept Lattice                                      3.3. Implementation
                                                                                                                                     an object of the formal context, whereas a feature will be
                                                                                                                                     considered an attribute.                                                            Concept analysis applied to the formal context                                  The implementation of the described approach is sur-
                               Thomas Eisenbarth, Rainer Koschke, Daniel Simon                                                                                                                                        described in the previous section gives a lattice, from                         prisingly simple (if one already has a tool for concept
                                                                                                                                        The relation R for the formal context necessary for con-
                  University of Stuttgart, Breitwiesenstr. 20-22, 70565 Stuttgart, Germany                                           cept analysis is defined as follows (where c ∈ C, f ∈ F):                         which interesting relationships can be derived. These rela-                     analysis). Our prototype for a Unix environment is an
                                                                                                                                           (c, f) ∈ R if and only if component c is required                          tionships can be fully automatically derived and presented                      opportunistic integration of the following parts:
                         {eisenbts, koschke, simondl}@informatik.uni-stuttgart.de                                                                                                                                     to the analyst such that the complicated theoretical back-
                                                                                                                                           when feature f is invoked; a subprogram is                                                                                                                 • Gnu C compiler gcc to compile the system using a com-
                                                                                                                                           required when it needs to be executed.                                     ground can be hidden. The only thing an analyst has to                            mand line switch for generating profiling information,
                            Abstract                               larly interesting and required components, and further                                                                                             know is how to interpret the derived relationships.
                                                                   expensive analyses can be aimed at selected components.              R can be visualized using a relation table as shown in                                                                                                        • Gnu object code viewer nm,
    Feature component maps describe which components                                                                                                                                                                     The following base relationships can be derived from
                                                                       This paper describes a quickly realizable technique to        Figure 1:                                                                        the sparse representation of the lattice:                                       • Gnu profiler prof,
are needed to implement a particular feature and are used          ascertain the feature component map based on dynamic
early in processes to develop a product line based on exist-                                                                                       f1      f2      f3     f4      f5      f6     f7      f8           • A component, c, is required for all features at and above                     • concept analysis tool concepts [8],
                                                                   information (gained from execution traces) and concept
ing assets. This paper describes a new technique to derive         analysis. The technique is automatic to a great extent.
                                                                                                                                            c1     !       !                                                            γ(c) in the lattice.                                                          • graph editor Graphlet [3] to visualize the concept lattice,
the feature component map and additional dependencies                                                                                       c2                     !      !       !                                   • A feature, f, requires all components at and below µ(f) in                    • and a short Perl script to ascertain the executed functions
utilizing dynamic information and concept analysis. The               The remainder of this article is organized as follows.                c3                     !      !               !      !       !              the lattice.                                                                    in the execution trace and to convert the file formats of
method is simple to apply, cost-effective, largely language        Section 2 gives an overview, Section 3 explains how con-                 c4                     !      !       !       !      !       !                                                                                              concepts and Graphlet (the script has just 225 LOC).
                                                                                                                                                                                                                      • A component, c, is specific to exactly one feature, f, if f
independent, and can yield results quickly and very early          cept analysis can be used to derive the feature component                              Figure 1. Relation Table                                      is the only feature on all paths from γ(c) to the top ele-                       The fact that the subprograms are extracted from the
in the process.                                                    map and Section 4 describes our experience with this tech-            The resulting concept lattice is shown in Figure 2. We                         ment.                                                                         object code makes the
                                                                   nique in an example. Section 5 references related research,       use the sparse representation for visualization showing an                       • A feature, f, is specific to exactly one component, c, if c
1. Introduction                                                    Section 6 concludes the paper.                                    attribute/feature at the uppermost concept in the lattice                          is the only component on all paths from µ(f) to the bot-
    Developing similar products as members of a product                                                                              where it is required (so the attributes spread from this node                      tom element (i.e, c is the only component required to
line promises advantages, like higher potential for reuse,         2. Overview                                                       down to the bottom). For a feature f, this node is denoted                         implement feature f).
lesser costs and shorter time to market. There are many                                                                              by µ(f). Analogously, a node is marked with an object/                           • Features to which two components, c1 and c2, jointly
                                                                      The technique described here is based on the execution
approaches to newly developing product lines from                                                                                    component c ∈ C in the sparse representation if it repre-                          contribute can be identified by γ(c1) ∧ γ(c2); graphically
                                                                   traces generated by a profiler for different usage scenarios.
scratch [2, 10]. However, according to Martinez in [15],                                                                             sents the most special concept that has c in its extent. This
                                                                   One scenario represents the invocation of one single fea-                                                                                            depicted, one ascertains in the lattice the closest com-
most successful examples of product lines at Motorola                                                                                unique node is denoted by γ(c). Hence, an object/compo-
                                                                   ture and yields all subprograms executed for this feature.                                                                                           mon node toward the top element starting at the nodes to
originated in a single separate product. Only in the course                                                                          nent c spreads from the node γ(c), to which it is attached,
                                                                   These subprograms identify the components. The required                                                                                              which c1 and c2, respectively, are attached; all features at
of time, a shared architecture for a product line evolved.                                                                           up to the top.
                                                                   components for all scenarios and the set of features are                                                                                             and above this common node are those jointly imple-
Moreover, large investments impose a reluctance against            then subject to concept analysis. Concept analysis gives                                                       f5 applies to
                                                                                                                                                               f3, f4                                                   mented by these components.
introducing a product line approach that ignores existing          information on relationships between features and                                                              these concepts
assets. Hence, introducing a product line approach has                                                                                f1, f2         f5               f6, f7, f8                                      • Components jointly required for two features, f1 and f2,
                                                                   required components as well as feature-feature and com-
generally to cope with existing code.                              ponent-component dependencies.                                        c1          c2               c3          c3 applies to                         are described by µ(f1) ∨ µ(f2); graphically depicted, one
    Reverse engineering helps creating a product line from                                                                                                       c4               these concepts                        ascertains in the lattice the closest common node toward
existing systems by identifying and analyzing the compo-           Concept Analysis. Concept analysis is a mathematical                                                                                                 the bottom element starting at the nodes to which f1 and
nents and deriving the individual architectures. They can          technique that provides insights into binary relations. The         <                                                  concept
                                                                                                                                                         Figure 2. Concept Lattice                                      f2, respectively, are attached; all components at and
then be unified to a product line architecture which is pop-        mathematical foundation of concept analysis was laid by
                                                                   Birkhoff in 1940. The binary relation in our specific appli-          In order to ascertain the relation table, a set of usage                        below this common node are those jointly required for
ulated by the derived components.
                                                                   cation of concept analysis to derive the feature component        scenarios needs to be prepared where each scenario trig-                           these features.
    As stated in Bayer et. al [1], early reverse engineering
is needed to derive first coarse information on existing            map states which components are required when a feature           gers exactly one relevant feature1. Then the system is used                      • Components required for all features can be found at the
assets needed by a product line analyst to set up a suitable       is invoked. The detailed mathematical background of con-          according to the set of usage scenarios. For each usage                            bottom element.
product line architecture.                                         cept analysis can be found in [7,13,14].                          scenario, the execution trace is recorded.                                       • Features that require all components can be found at the
    One important piece of information for a product line                                                                               An execution trace contains all called subprograms for                          top element.
analysis that tries to integrate existing assets is the so-        3. Feature Component Map                                          a usage scenario or an invoked feature, respectively.                                The information described above can be derived by a
called feature component map that describes which com-                                                                               Hence, each system run yields all required components for                        tool and fed back to the product line expert. As soon as a
ponents are needed to implement a particular feature. A               In order to derive the feature component map via con-          a single scenario that exploits one feature. A single col-                       decision is made to re-use certain features, all components
feature is a realized (functional as well as non-functional)       cept analysis, one has to define the formal context                umn in the relation table can be obtained per system run.                        required for these features (easily derived from the con-
requirement (the term feature is intentionally weakly              (objects, attributes, relation) and to interpret the resulting    Applying all usage scenarios provides the relation table.                        cept lattice) form a starting point for further static analyses
defined because its exact meaning depends on the specific            concept lattice accordingly.
                                                                                                                                                                                                                      to investigate quality (like maintainability, extractability,
context). Components are computational units of a soft-                                                                                                                                                               and integrability) and to estimate effort for subsequent
                                                                   3.1. Context for Feature and Components
ware architecture.                                                                                                                                                                                                    steps (wrapping, reengineering, or re-development from
    On the basis of the feature component map and addi-                                                                                1. It is possible to combine multiple features into one scenario, mak-
                                                                      The set of relevant features F will be determined by the            ing the interpretation of the resulting concept lattice more compli-        scratch).
tional economic reasons, a decision is made for particu-           product line experts. We consider all the system’s subpro-             cated. This is beyond the scope of this paper.
The paper was selected for a special issue of ICSM for TSE.
Before we submitted the paper, we asked the editors for the page limit.
They told us there was no limit.
Locating Features in Source Code
                                Thomas Eisenbarth, Rainer Koschke, and Daniel Simon
                                                                                                                                                                      1   2




                                                                                                                                                                                                     scenario
                                                                                                                                                                                                                         *
                                                                                                                                                                                                                          invokes
                                                                                                                                                                                                                                 *
                                                                                                                                                                                                                                              feature
                                                                                                                                                                                                                                                             *
                                                                                                                                                                                                                                                              implemented by



                                                                                                                                                                                                                                                                   basic block
                                                                                                                                                                                                                                                                                  *
                                                                                                                                                                                                                                                                                      computational unit


                                                                                                                                                                                                                                                                                           routine             module
                                                                                                                                                                                                                                                                                                                                                        scenario
                                                                                                                                                                                                                                                                                                                                                        Draw-circle-diameter
                                                                                                                                                                                                                                                                                                                                                        Draw-circle-radius
                                                                                                                                                                                                                                                                                                                                                        Move-circle
                                                                                                                                                                                                                                                                                                                                                        Color-circle
                                                                                                                                                                                                                                                                                                                                                                                  executed computational units
                                                                                                                                                                                                                                                                                                                                                                                  draw, setDiameter
                                                                                                                                                                                                                                                                                                                                                                                  draw, setRadius
                                                                                                                                                                                                                                                                                                                                                                                  draw, setDiameter, move
                                                                                                                                                                                                                                                                                                                                                                                  draw, setDiameter, color
                                                                                                                                                                                                                                                                                                                                                                                                                    cases is to reveal errors, and hence test cases tend to be
                                                                                                                                                                                                                                                                                                                                                                                                                    complex and to cover many features. Contrarily, scenarios
                                                                                                                                                                                                                                                                                                                                                                                                                    for our feature location technique should be simpler and in-
                                                                                                                                                                                                                                                                                                                                                                                                                    voke fewer features to differentiate the computational units
                                                                                                                                                                                                                                                                                                                                                                                                                    more clearly.
                                                                                                                                                                                                                                                                                                                                                                                                                       In order to explore variations of a feature, the domain
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  3    4


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       presented empirical data indicating that less expensive
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       and—theoretically—less precise techniques to resolve func-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       tion pointers reach the precision of more expensive and—
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       theoretically—more precise techniques [20] due to the com-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       mon way of using function pointers (as opposed to pointers
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       to stack and heap objects).
he set L of all concepts of a given formal context and
the partial order ≤ form a complete lattice, called concept
lattice:

{(O, A) ∈ 2O×2A | A = σ(O) and O = τ (A)} (5)                                       o1
o2
a1
×
a2 a3 a4 a5

×
×
×    ×
a6
×
a7
×
×
c1
c2
c3
c4
{o1 , o2 , o3 }, {a7 })
{o1 , o2 }, {a4 , a7 })
{o1 , o3 }, {a6 , a7 })
{o2 , o3 }, {a5 , a7 })
{o1 }, {a1 , a4 , a6 , a7 })



cenario creation: Based on features (either known ini-
tially or discovered during incremental analysis), the do-
main expert creates scenarios.
tatic dependency-graph extraction: The static depen-
dency graph of the system under analysis is extracted.
ynamic analysis: The system is used according to se-
distinctive; that is, they should invoke all relevant features
but as few other features as possible to ease the mappings
from scenarios to features and from features to computa-
tional units (often it is unavoidable to invoke features that
are not of interest for the task at hand).
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                The scenarios are documented for future use similarly to
object
ect. III

set of objects
all objects
attribute
o


a
u


s
main part
computational unit
set of computational units
all computational units
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   scenario
asic Interpretation
oncept analysis applied to the formal context described
in the last section yields a lattice from which interesting
relationships can be derived. These relationships can be
fully automatically derived and presented to the analyst.



o we subm tted a paper w th 20 pages
                                                                                                                                                                                                                                                                                                                                                                   Fig. 3. Execution profiles for Fig. 2.
o3                   ×         ×                 ×               ×                                     c5      ({o2 }, {a2 , a4 , a5 , a7 })                                                                                                                                                               set of attributes        A     S    set of scenarios                     Thus, the analyst has to know how to interpret the derived
                                                                                                                                                                                                                                        Fig. 1. Conceptual model in UML notation.                                                                                                                                                                                                                                                                         The infimum ( ) of two concepts in this lattice is com-                                                                                                                                                                                                           lected scenarios.




he rev ewers asked us to add more deta and the vers on that was
   Abstract— Understanding the implementation of a certain                                  a feature-oriented search focusing on the components of                                                                                                                                                                                                                                                                 expert provides several scenarios, each triggering a feature                                                                                                                                                                                  (a)A formal context.                                                                     c6      ({o3 }, {a3 , a5 , a6 , a7 })                                                                                             test cases. Additionally, the documentation includes the          all attributes           A     S    all scenarios                        relationships, but does not need to be familiar with the
feature of a system requires to identify the computational                                  interest is needed.                                                                                                                                                                                                                                                                                                     variation with a different set of input. To obtain effec-                        III. Formal Concept Analysis                         puted by intersecting their extents as follows:                                                                                                                                                    ⊥   (∅, {a1 , a2 , a3 , a4 , a5 , a6 , a7 })                    4. Interpretation of concept lattice: The data yielded by         features invoked by the scenarios. If the domain expert also
units of the system that contribute to this feature. In manyincidence relation       I     I    invocation table                     theoretical background of lattices.
cases, the mapping of features to the source code is poorly                                    This article describes a process and its supporting tech-                                                                                                                                                                                           specifically required for a feature, concept analysis addi-       tive and efficient coverage, he builds equivalence classes of           This section presents the necessary background informa-                                                                                                                                                                                                         (b)Concepts for the formal context.                              the dynamic analysis is presented to and interpreted by the       specifies the expected result of the scenario, the scenario                                                                                    The following base relationships can be derived from the
                                                                                            niques to identify those parts of the source code which im-                   servable behavior of the system that can be triggered by                                  analysis, similarly to Wilde and Scully’s technique [7]. If                                                                                     relevant input data. Identifying equivalence classes may                                                                                    (O1 , A1 ) (O2 , A2 ) = (O1 ∩ O2 , σ(O1 ∩ O2 ))    (6)                                                                                                                                                                                                     analyst. Relevant computational units are identified.              may also be used as simple test case.
documented. In this paper, we present a semi-automatic                                                                                                                                                                                                                                                                                             tionally allows to derive detailed relationships between fea-                                                                       tion on formal concept analysis. Readers already familiar                                                                          Fig. 4. An example relation between objects and attributes. The corresponding concepts that can be derived from the formal context are                                                                                                                                                                                              Fig. 7. Translation from the identifiers of Sect. III and the identifiers   sparse representation of the lattice (note the duality):
technique that reconstructs the mapping for features that                                   plement a specific set of related features. The process is au-                 the user.                                                                                 the system is used as described by the scenario, the exe-                                                                                       require knowledge on internal details of a system.                 with concept analysis can skip to the next section.                                                                                listed on the right.                                                                                                                                                                             5. Static dependency analysis: The analyst searches the                                                                            used from here on, which instantiate formal concept analysis.
                                                                                                                                                                                                                                                                                                                                                   tures and computational units. These relationships iden-                                                                                                                                               The infimum describes a set of common attributes of                                                                                                                                                                                                                                                                                                                                                                                                                            • A computational unit u is required for all scenarios at
are triggered by the user and exhibit an observable behavior.
                                                                                            tomated to a large extent. It combines static and dynamic                        Example. Our fictitious drawing tool FIG (which re-                                     cution trace lists the sequence of all performed calls for                                                                                                                                                            Formal concept analysis is a mathematical technique for                                                                                                                                                                                                                                                                          system for additional computational units that are relevant       C. Dynamic Analysis
   The mapping is in general not injective; that is, a com-                                                                                                                                                                                                                                                                                        tify computational units jointly required by any subset of       Computational units. The exact notion of computational                                                                              two sets of objects. Similarly, the supremum ( ) is de-                                                                                                                                                                                                                                                                                                                                                                                                                         and above γ(u) in the lattice; for instance, SetDiameter is
                                                                                            analyses and uses concept analysis—a mathematical tech-                       sembles XFIG [5]) allows a user to draw, move, and color                                  this scenario. Since our technique aims at only identify-                                                                                                                                                          analyzing binary relations. The mathematical foundation                                                                                                                                                                                                                                                                             to selected features.                                                The goal of the dynamic analysis is to find out which
putational unit may contribute to several features. Our                                                                                                                                                                                                                                                                                            features and classify computational units as low-level or        unit is a generic parameter to our technique and depends                                                                            termined by intersecting the intents:                                                                                                                                                                                                                                                                                                                                                                    The following subsections describe how to achieve these                required for Draw-circle-diameter, Move-circle, and Color-
technique allows to distinguish between general and specific                                 nique to investigate binary relations—to derive correspon-                    different objects, such as rectangles, circles, ellipses, and so                           ing the computational units rather than at the order of the                                                                                                                                                        of concept analysis was laid by Birkhoff [21] in 1940. For                                                                                                                                                                                                                                                                              The different roles of human resources for these activ-         computational units contribute to a given set of features.
                                                                                                                                                                                                                                                                                                                                                   high-level with respect to the given set of features.            on the task and system at hand. In principle, there is no                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 goals. The basic process of lattice interpretation is depicted            circle according to Fig. 10.
computational units with respect to a given set of features.                                dences between features and computational units. Concept                      forth. From the viewpoint of an analyst who is interested                                 computational units’ execution, we need only the execution                                                                                                                                                         more detailed information on formal concept analysis we re-                                                                                                                                                                                                                                                                         ities are (human resources are highlighted in the process         Each feature is invoked by at least one of the prepared
                                                                                                                                                                                                                                                                                                                                                      Example. Intersecting the execution profiles in Fig. 3         limit to the granularity of computational units: One could                                                                                  (O1 , A1 ) (O2 , A2 ) = (τ (A1 ∩ A2 ), A1 ∩ A2 )   (7)                                                                                                                                                                                                                                                                                                                                                                                                                  • A scenario s requires all computational units at and be-
For a set of features, it also identifies jointly and distinctly
                                                                                            analysis additionally yields the computational units jointly                  in the implementation of circle operations in FIG, the abil-                              profile. The execution profile of a given program run is                                                                                                                                                             fer to [22], where the mathematical foundation is explored.                                                                                                                                                                                                                                                                         diagrams by a UML actor icon):                                                                                                     in Fig. 9.
required computational units.                                                                                                                                                                                                                                                                                                                      additionally shows that the computational units jointly re-                                                                                                                                                                                                                                                     ({o1 , o2 , o3 }, {a7 })                                                                                      (∅, {a7 })                                                                                                  scenarios.                                                                                                                                 low µ(s) in the lattice; for instance, Color-circle requires
                                                                                                                                                                          ity to draw, to move, and to color a circle are three relevant                            the set of computational units called during the run with-                                                                                      use basic blocks, routines, classes, modules, or subsystems.
   The presented technique combines dynamic and static                                      and distinctly required for a set of features.                                                                                                                                                                                                         quired for Draw-circle-diameter, Move-circle, and Color-                                                                               Concept analysis deals with a relation I ⊆ O×A between        The supremum yields the set of common objects, which                                                                                                                                                                                                               • The analyst is the person interested in how features map           The process that deals with the dynamic analysis is                                                                                     color, setDiameter, and draw according to Fig. 10.
                                                                                                                                                                          features.                                                   2                             out information about the order of execution. From the                                                                                          Subsystems as computational units are suitable to obtain                                                                                                                                                                                                              ({o1 , o3 }, {a6 , a7 })                                                                                    (∅, {a6 })                                                                                                                                              D.1 Scenario Selection
analyses to rapidly focus on the system’s parts that re-                                       An advantage of starting with features is that domain                                                                                                                                                                                               circle are draw and setDiameter, where draw is required                                                                             a set of objects O and a set of attributes A. The tuple C =      share all attributes in the intersection of two sets of at-                                                                                                                                                                                                        onto source code. She interprets the concept lattice and          shown in more detail in Fig. 8. The inputs to the process                                                                                  • A computational unit u is specific to exactly one scenario
late to a specific set of features. Dynamic information is                                                                                                                    Every computational unit (excluding dead code) con-                                    execution profile, we gather the fact that a computational                                                                                       an overview for very large systems. Considering routines,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    A number of execution profiles is selected in order to
                                                                                            knowledge from the user’s perspective may be exploited,                                                                                                                                                                                                for all scenarios.                                        2                                                                         (O, A, I) is called a formal context. For a set of objects       tributes.                                                                                                                                                                                                                                                          performs the static analysis.                                     are source code and a set of scenarios created by process                                                                                  s if s is the only scenario on all paths from γ(u) to the
gathered based on a set of scenarios invoking the features.                                                                                                               tributes to the purpose of the system and thus corresponds                                unit has been executed at least once. We ignore the dura-                                                                                       methods, subprograms, etc. as computational units gives                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   set up the context. Execution profiles may be recombined
                                                                                            which is especially useful for external change requests and                                                                                                                                                                                                                                                                                                                                O ⊆ O, the set of common attributes σ(O) is defined as:                                                                                                                                                                                                                                                                              • The domain expert designs the scenarios and lists the




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         fina y accepted had 25 pages
Rather than assuming a one-to-one correspondence between                                                                                                                                                                                                                                                                                              The information gained by concept analysis is used to                                                                                                                                                The concept lattice for the formal context in Fig. 4(a)                ({o1 , o2 }, {a4 , a7 })                                           ({o2 , o3 }, {a5 , a7 })                              (∅, {a4 })                                     (∅, {a5 })                                                                         step 1 in Fig. 6. We proceed as follows:                                                                                                   top element; for instance, color is specific to Color-circle
                                                                                                                                                                          to at least one feature—be it a very basic feature, such                                  tion of the computational unit’s execution because compu-                                                                                       an overview at the global declaration level, whereas classes                                                                                                                                                                                                                                                                                                                                                                                                                                                                              to analyze various aspects of a system, where execution
features and scenarios as in earlier work, we can now handle                                error reports expressed in the terminology of a program’s                                                                                                                                                                                              guide a subsequent static analysis along the static depen-                                                                                                                                           can be depicted as a directed acyclic graph whose nodes                                                                                                                                                                                                            invoked features for each scenario.                               3.1 Compile for recording: The source code is compiled                                                                                     according to Fig. 10.
scenarios that invoke many features.                                                                                                                                      as the ability of the system to start or terminate. Yet,                                  tation time hardly gives hints for feature-specific compu-                                                                                       and modules lie in between subsystem and global declara-                    σ(O) = {a ∈ A | (o, a) ∈ I for all o ∈ O}        (1)                                                                                                                                                                                                                                                                                                                                                                                                          profiles and scenarios can be reused.
                                                                                            problem domain.                                                                                                                                                                                                                                                                                                                                                                                                                                             represent the concepts and whose edges denote the                                                                                                                                                                                                                  • The user is the person who uses the system according            with profiling options or is instrumented to obtain the ex-
                                                                                                                                                                                                                                                                                                                                                   dency graph in order to narrow the computational units to        tion level. Basic blocks as computational units are only• Scenarios to which two computational units u1 and
   Furthermore, we show how our method allows incremen-
                                                                                               The remainder of this article is organized as follows.                     only few features may actually be of interest to the ana-                                 tational units. Once the specific computational units have                                                                                                                                                                                                                                                                                                    ({o1 }, {a1 , a4 , a6 , a7 })                                ({o3 }, {a3 , a5 , a6 , a7 })                                ({o1 }, {a1 })                             ({o3 }, {a3 })       to the selected scenarios.                                        ecution profile.                                                     Example. The analyst of FIG may first be interested in
tal exploration of features while preserving the “mental                                                                                                                                                                                                                                                                                           those that form self-contained and understandable feature-       adequate for smaller systems or parts of a system where              Analogously, the set of common objects τ (A) for a set of      superconcept-subconcept relation ≤ as shown in Fig. 5(a).                                                                                                                                                                                                                                                                                                                                                                                                                       u2 jointly contribute can be identified by the supremum
                                                                                            Sect. II gives an overview of our technique and introduces                    lyst for her task at hand. In the following, we assume that                               been identified through our technique, other techniques,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   All activities except the static dependency graph extrac-      3.2 Scenario execution: The system is executed by a              the two different ways to draw a circle. She would therefore
map” the analyst has gained through the analysis.                                                                                                                                                                                                                                                                                                  specific computational units. Computational units that            more detail is needed due to the likely information over-          attributes A ⊆ A is defined as:                                   The most general concept is called the top element and                                                                                                                                                                                                                                                                                                                                                                                                                          γ(u1 ) γ(u2 ). In the lattice, the supremum is the closest
                                                                                            the basic concepts. Sect. III introduces concept analysis.                    only a subset of features is relevant. Consequently, only                                 such as static or dynamic slicing [8], [9], can be used to                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             tion (which is done only once) benefit from the knowledge          user according to the scenarios and execution profiles are        select the two scenarios Draw-circle-diameter and Draw-
  Keywords— program comprehension, formal concept anal-                                                                                                                                                                                                                                                                                            are only very basic computational units used as building         load to the analyst.                                                                                                                is denoted by . The most special concept is called the                                                                           ({o2 }, {a2 , a4 , a5 , a7 })                                                                            ({o2 }, {a2 })                                                                                                                                                                                                                        common node toward the top element starting at the nodes
                                                                                            Sect. IV describes the process for locating and analyzing                     the computational units required for these features are of                                obtain the order of execution if required. These techniques                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            that is gained in previous iterations and can be applied re-      recorded.                                                        circle-radius. When she understands the differences be-
ysis, feature location, program analysis, software architec-                                                                                                                                                                                                                                                                                       blocks for other computational units but not containing any         For practical reasons, for this paper we decided to use                  τ (A) = {o ∈ O | (o, a) ∈ I for all a ∈ A}       (2)    bottom element and is denoted by ⊥.                                                                                                                                                                                                                                                                                                                                                                                                                                             to which u1 and u2 are attached. All scenarios at and above
ture recovery                                                                               features in more detail. In Sect. V, we report on two case                    interest, too. The feature-unit map—as one result of                                      can then be applied more goal-oriented by focusing on the                                                                                                                                                                                                                                                                                                                             (∅, {a1 , a2 , a3 , a4 , a5 , a6 , a7 })                                                                                    (∅, ∅)                               peatedly until sufficient knowledge about the system has               If suitable tool support is available, a scenario’s execu-    tween these two features, she would investigate other circle
                                                                                                                                                                                                                                                                                                                                                   application-specific logic are sorted out. Additional static      routines as the computational unit of choice, where a rou-                                                                             The concept lattice can be visualized in a more readable                                                                                                                                                                                                                                                                                                                                                                                                                     this common node are those jointly implemented by u1 and
                                                                                            studies conducted to validate our approach. The related                       our technique— describes which computational units im-                                    most feature-specific computational units yielded by our                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                been gained. The order of the activities is specified by           tion may be recorded at wish to exclude parts of the execu-      operations and additionally select Move-circle and Color-
                                                                                                                                                                                                                                                                                                                                                   analyses, like strongly connected component identification,       tine is a function, procedure, subprogram, or method ac-              A formal context can be represented by a relation table,      equivalent way by marking only the graph node with an                                                     (a)Full concept lattice.                                                                                (b)Sparse representation.                                                                                                                                                                                                                                     u2 . For instance, setDiameter and color jointly contribute
                                                                                            research in the area is summarized in Sect. VI.                               plement a given set of relevant features.                                                 techniquethe IDEF0 diagram in Fig. 6: An activity may start once           tion that are not relevant, such as start-up and shutdown of     circle.                                                  2
                     I. Introduction                                                                                                                                                                                                                                                                                                               dominance analysis, and program slicing [8] support the          cording to the programming language. For the case studies          where the columns hold the objects and the rows hold the         attribute a ∈ A whose represented concept is the most gen-                                                                                                                                                                                                                                                                                                                                                                                                                      to Color-circle according to Fig. 10.
                                                                                                                                                                          Scenario. Features are abstract descriptions of a system’s                                Feature-unit map. Our technique derives the feature-unit                       search for the units of interest.                                                                                                   attributes. An object oi and attribute aj are in the rela-       eral concept that has a in its intent. Analogously, a node                                                           Fig. 5. The concept lattices for the example context in Fig. 4.                                                                               its input is available. The activities are explained in the       the system [24], [25], [26]. Certain debuggers, for instance,                                                                              • Computational units jointly required for two scenarios s1
                                                                                                                                                                                                                                                                                                                                                                                                                    presented later on in this paper, routines were appropriate.                                                                                                                                                                                                                                                                                                                                                                                                                                                                              D.2 Concept Analysis
U     NDERSTANDING how a certain feature is imple-
      mented is a major problem of program understand-
ing. Before real understanding starts, one has to locate
                                                                                                                                      II. Overview
                                                                                               The goal of our technique is to identify the computa-
                                                                                                                                                                          expected behavior. If a user wants to invoke a feature of a
                                                                                                                                                                          system, he needs to provide the system with adequate input
                                                                                                                                                                                                                                                                    map through concept analysis, a mathematically sound
                                                                                                                                                                                                                                                                    technique. In our application of concept analysis, concept
                                                                                                                                                                                                                                                                                                                                                      For large and complex systems, our approach can be ap-
                                                                                                                                                                                                                                                                                                                                                   plied incrementally as described in this paper.                  Static and dynamic dependencies. The results from concept
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       tion I iff the cell at column i and row j is marked by ”×”.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       As an example, a binary relation between arbitrary objects
will be marked with an object o ∈ O iff it represents the
most special concept that has o in its extent. The unique
following sections.

tatic Dependency Graph Extraction
allow to start and end trace recording. Instrumenting the
source code so that only relevant parts are recorded is gen-        This process embodies a completely automated step that
creates a concept lattice from the invocation table.
and s2 are described by the infimum µ(s1 ) µ(s2 ). In the
lattice, the infimum is the closest common node toward the




hen we subm tted the camera-ready the product on peop e stepped n
                                                                                            tional units that specifically implement a feature as well as                  to trigger the feature. For instance, to draw a circle, the                               analysis—simply stated—mutually intersects the execution                                                                                        analysis based on dynamic information are used to guide            and attributes is shown in Fig. 4(a). For that formal con-       element in the concept lattice marked with a is therefore:                                                                                                                                                                                                                                                                           erally not an option because this requires that the feature-                                                                               bottom element starting at the nodes to which s1 and s2
the implementation of the feature in the code. Systems                                      the set of jointly or distinctly required computational units                 user of FIG needs to press a certain button on the control                                profiles for all scenarios and all resulting intersections to                   Applicability                                                    the analyst in her static analysis, that is, her inspection of     text, we have:                                                                                                                                                                                                                                                                                                                         The static dependency graph should subsume all types           unit map is at least partially known already.                       In order to derive the feature-unit map by means of con-               are attached. All computational units at and below this
need for additional scenarios (incremental analysis)
often appear as a large number of modules each contain-                                     for a set of features. To this end, the technique combines                    panel for selecting the circle drawing operation, then to                                 obtain the specific computational units for a feature and                                                                                        the static dependency graph. We use dynamic information                                                                                         µ(a) = {c ∈ L(C) | a ∈ intent(c)}              (8)                                                                                                                                                                                                     of entities and dependencies present in the dynamic depen-           An alternative solution is to specify a special “start-end”   cept analysis, we have to define the formal context (i.e., the             common node are those jointly required for s1 and s2 . For
                                                                                                                                                                                                                                                                                                                                                      The retrieval of the feature-unit map is based on dynamic
ing hundreds of lines of code. It is in general not obvious                                 static and dynamic analyses.                                                  position the cursor on the drawing area for specifying the                                the jointly and distinctly required computational units for                                                                                     only as a guide and not as a definite answer because dy-                              σ({o1 }) = {a1 , a4 , a6 , a7 }                                                                                                                                                                                                                                                                                   dency graph: It is unnecessary to extract dynamic informa-        scenario containing the actions to be filtered out. For in-       objects, the attributes, and the relation) and to interpret               instance, setDiameter and draw are jointly required for
                                                                                                                                                                                                                                                                                                                                                   information where all computational units that are exe-
which parts of the source code implement a given feature.                                      This section gives an overview on our technique, de-                       center of the circle, to specify the diameter by moving the                               a set of features.                                                                                                                              namic information depends upon suitable input data and                           τ ({a6 , a7 }) = {o1 , o3 }                          The unique element marked with object o is:                                 (initially)
tion that is not used in the subsequent static analysis. Yet,     stance, in order to mask out initialization and finalization      the resulting concept lattice accordingly.                                Move-circle and Color-circle according to Fig. 10.
                                                                                                                                                                                                                                                                                                                                                   cuted for a scenario are collected. The scenario describes                                                                                                                                                                                                                         relevant
Typically existing documentation is outdated (if it exists at                               scribes the relationships among features, scenarios, and                      mouse, and eventually to press the left mouse button for                                    Example. FIG allows to draw a circle either by diameter                                                                                       the test environment in which the scenarios are executed.                                                                                                                                                         features            scenario           scenarios                                                                                                                                     the static dependency graph may provide additional types          code, the domain expert may prepare a “start-end” sce-              The formal context for applying concept analysis to de-                • Computational units required for all scenarios can be
                                                                                                                                                                                                                                                                                                                                                   how to invoke a feature. This section describes the as-                                                                                                                                                          γ(o) = {c ∈ L(C) | o ∈ extent(c)}              (9)
all), the system’s original architects are no longer available,                             computational units (summarized in Fig. 1) and explains                       finalizing the circle. Such sequences of user inputs that                                  or by radius. The analyst who is interested in the differ-                                                                                          The static dependency graph can be extracted from pro-             A tuple c = (O, A) is called a concept iff A = σ(O)                                                                                                              creation                                                              filter, granularity                                                                        of entities and dependencies and also more fine-grained in-        nario in which the system is started and immediately shut        rive the relationships between scenarios and computational                found at the bottom element; for instance, draw is required
                                                                                                                                                                                                                                                                                                                                                   sumptions on features, scenarios, and computational units                                                                                                                                                                                                                                                           1
or their view is outdated due to changes made by others.                                    what kind of dynamic information is used as input to our                      trigger actions of a system with observable result [6] are                                ences of these two circle operations and their differences to                                                                                    cedural, functional, as well as object-oriented programming        and O = τ (A), that is, all objects in c share all attributes                                                                                                                                                                                                                                                                       formation if a static extraction tool is used that exceeds        down.                                                            units will be laid down as follows:                                       for all scenarios according to Fig. 10.
                                                                                                                                                                                                                                                                                                                                                   we make.                                                                                                                                                                                                We will call a graph representing a concept lattice using
So maintenance introduces incoherent changes which cause                                    technique. The section also introduces a simple example                       called scenarios.                                                                         other circle operations, such as moving and coloring, will                                                                                      languages. Because execution profiles can be recorded for           in c. For a concept c = (O, A), O is called the extent of                                                                                                                                                                                 execution                                                                                 the capabilities of the available dynamic extraction tool.           Since each scenario is a precise description of the se-       • Computational units will be considered objects.                         • Scenarios that require all computational units can be




domain
the system’s overall structure to degrade [1]. Understand-                                                                                                                                                                                                                                                                                         Features. Our technique is primarily suited for functional                                                                                                                                           this marking strategy a sparse representation of the lat-




expert
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       c, denoted by extent(c), and A is called the intent of c,                                                                                                                                                                                                interpretation
                                                                                            that we will use throughout the description of the method                        Our technique requires a set of scenarios that invoke the                              set up the scenarios listed in Fig. 2. Figure 3 lists the                                                                                       these languages, too, our technique is applicable to all these                                                                                                                                                                                                              dynamic                          profiles
of concept
n this case, the static analysis can leverage less dynamic       quence of user inputs that trigger actions of the system,        • Scenarios will be considered attributes.                                found at the top element. In Fig. 10, there is no such
ing the system in turn becomes harder any time a change                                                                                                                                                                                                                                                                                            features that may be mapped onto computational units.                                                                               denoted by intent(c). Informally speaking, a concept cor-        tice. The equivalent sparse representation of the lattice in                                                                            analysis
information but is still conservative. In our case studies, for   every execution of a scenario yields the same execution          • A pair (computational unit u, scenario s) is in relation I
                                                                                            in the following sections. The example is inspired by a pre-                  features the analyst is interested in. A scenario s invokes                               computational units executed for the scenarios in Fig. 2.                                                                                       languages. However, the precision of the static extrac-                                                                                                                                                                                                                                             3                            lattice         4                                                                                                                                                                                                                                                                  scenario.
is made to it.                                                                                                                                                                                                                                                                                                                                     In particular, non-functional features, such as robustness,                                                                         responds to a maximal rectangle of filled table cells modulo      Fig. 5(a) is shown in Fig. 5(b). The content of a node N                                                                                                                                                                                                           instance, we extracted many detailed static dependencies          profile unless the system is nondeterministic. In case of         if u is executed when s is performed.
                                                                                            vious case study [4] in which we analyzed the drawing tool                    a feature f if f ’s result can be observed by the user when                               Intersecting the execution profiles shows that setRadius is                                                                                      tion influences the ease of the analyst’s inspection of the                                                                                                                                                                                      source                                                                                                feature−                                                                                                                                                                                                                                                         Beyond these relationships between computational units
   One option, when trying to escape this vicious circle,                                                                                                                                                                                                                                                                                          reliability, or maintainability, do not easily map to compu-                                                                        row and column permutations. In Fig. 4(b), all concepts          in this representation can be derived as follows:                                                           code                                                                                                                                                   among global declarations (routines, global variables, and        nondeterminism, one could either unite the profiles of all
                                                                                                                                                                                                                                                                                                                                                                                                                    static dependencies, and static analysis is inherently more




compiler
                                                                                                                                                                          the system is used as described by scenario s. A scenario                                 specific to feature Draw-circle-radius, move to Move-circle,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Figure 7 shows how to map the identifiers used in the                   and scenarios, further useful aspects between scenarios on




analysis
unit




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            concept
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          profiler




analyst
is to completely reverse engineer the system in order to                                                                                                                                                                                                                                                                                           tational units.                                                                                                                     for the relation in Fig. 4(a) are listed.                        • The objects of N are all objects at and below N .                                                                                                                                                               map
user-defined types) but the profiler we used let us extract         executions of the same scenario or differentiate each sce-




hey to d us that we have on y 12 pages
                                                                                                                                                                          may invoke multiple features and features may be invoked                                  and color to Color-circle.                                                                                                                      difficult for object-oriented languages (and for functional                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 general description of concept analysis in Sect. III to the




user
one hand and between computational units on the other




tool
exhaustively identify its components and to assign fea-                                     Computational unit. A computational unit is an exe-                                                                                                                                                                                                       The technique is suited only for features that can be in-     languages with higher-order functions) than for procedural            The set of all concepts of a given formal context forms a     • The attributes of N are all attributes at and above N .
only the dynamic call relationship among routines. This           nario execution. The latter is useful to identify differences     identifiers used in the specific instantiation of concept anal-
                                                                                                                                                                          by multiple scenarios. For instance, a scenario for movinghand may be derived:
tures to components. We integrated published automatic                                      cutable part of a system. Examples for computational                                                                                                                           scenario name                    actions performed                      voked from outside; internal implementation features, such       languages.                                                         partial order via the superconcept-subconcept ordering ≤:        For instance, the node in Fig. 5(b) marked with o1 and a1                                                                                                                                                                                                          way, we had to analyze static variable accesses that might        due to nondeterminism.                                           ysis within our method.
                                                                                                                                                                          a circle requires to draw the circle first, so this scenariostatic                                                                                                                                                                                                                                          • If γ(u1 ) < γ(u2 ) holds for two computational units u1
techniques for component retrieval in an incremental semi-                                  units are instructions (like accesses to global variables),                                                                                                                                                                                            as the use of a garbage collector, may not necessarily be                                                                                                                                            is the concept c4 = ({o1 }, {a1 , a4 , a6 , a7 }).                                                                                                                                                                                                                 have never been executed in any of our scenarios.
                                                                                                                                                                          also invokes feature “circle drawing”. There may be even                                         Draw-circle-diameter             draw a circle by diameter                                                                                  Static analyses need to make conservative assumptions                                                                                                                                                                                                              static dependency
dependency graph                    dependency                                                                                                                                                                  The system is used according to the set of scenarios, one              and u2 , then computational unit u2 is more specific with
automatic process, in which the results of selected auto-                                   basic blocks, routines, classes, compilation units, compo-                                                                                                                                                                                             deterministically and easily triggered from outside.                                                                                              (O1 , A1 ) ≤ (O2 , A2 ) ⇔ O1 ⊆ O2           (3)       For practical reasons, it is sometimes useful to apply only                                                                    graph extraction                                                                                        5                                                                                          D. Interpretation of Concept Lattice
                                                                                                                                                                          different scenarios all invoking the same set of features.                                        Draw-circle-radius               draw a circle by radius                                                                                 in the presence of pointers and dynamic binding, which                                                                                                                                                                                                                                                 2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       analysis
statically
at a time, and the execution profiles are recorded. Each                   respect to the given scenarios than computational unit u1
matic techniques are validated by the user [2].                                             nents, modules, or subsystems. The exact specification of                                                                                                                                                                                                                                                                                                                                                                                                    one of (8) or (9). For example if we have a large number of                                                                                                                                                                                                        B. Scenario Creation
                                                                                                                                                                          Each scenario, then, represents an alternative way of in-                                        Move-circle                      draw a circle by diameter              Scenarios. Scenarios are designed (or selected from existing     weaken the precision of the dependency graph. Fortu-               or, dually, with
validated                                                                               In this process step, a concept lattice for the relation      system run yields all executed computational units for a                  because u1 contributes not just to the features for which u2
   However, exhaustive methods are not cost-effective. For-                                  a computational unit is a generic parameter of our method.                                                                                                                                                                                                                                                                                                                                                                                                  attributes but just a small number of objects, we eliminate




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           dependency
feature−
                                                                                                                                                                          voking the features. For instance, FIG allows a user to                                                                           and move it                            test cases) to invoke a known set of relevant features; that     nately, research in pointer analysis has made considerable                                                                                                                                                                                                                                                                                                                          unit map             A domain expert is needed for creating the scenarios.           table created by process step 3 is built. The goals of inter-    single scenario; that is, one column of the relation table                contributes, but also to other features. For instance, color
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        the redundant appearance of attributes and keep the full




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           extractor




analyst
tunately, knowledge of components implementing a spe-                                       Feature. A feature is a realized functional requirement of                    push a button or to use a keyboard shortcut to begin a cir-                                      Color-circle                     draw a circle by diameter              is, we assume that the analyst knows in advance which            progress. There is a large body of work on pointer analy-                        (O1 , A1 ) ≤ (O2 , A2 ) ⇔ A1 ⊇ A2           (4)                                                                                                                                                                                                                                                                       Any available information on the system’s behavior (e.g.,         preting the resulting concept lattices are:                      can be filled per system run. Applying all scenarios that                  is more specific to Color-circle than setDiameter and set-




graph
list of objects in the concepts.                                                       human involvement
cific set of features suffices in many cases. Consequently,                                    a system (the term feature is intentionally defined weakly                     cle drawing operation. A set of scenarios each representing                                                                       and color it                           features are invoked by a scenario.                              sis for procedural languages [10], [11], [12], [13], [14], [15],                                                                                                                                                           (not part of IDEF0 notation)                                                                                                                                                documentation, existing test cases, domain models, etc.) is       1. Identification of the relationships between scenarios and      have been selected during the process of scenario selection               Diameter is more specific than draw according to Fig. 10.
                                                                                            because its exact meaning depends on the specific context).                    options and choices for the same feature resembles a use                                                     Fig. 2. Example scenarios for FIG.                             Because suitable scenarios are essential to our technique,    [16], [17] and object-oriented languages [18], [19] that re-         Note that (3) and (4) imply each other by definition. If                          IV. Analysis Process                                                                                                                                                                                                                             useful as input to him. Existing test cases may be useful         computational units (process steps 4.1–4.3)                      provides the relation table for formal concept analysis.                  • If µ(s1 ) < µ(s2 ) holds for two scenarios s1 and s2 , then
  T. Eisenbarth, R. Koschke, and D. Simon are with the In-                                  Generally, the term feature also subsumes non-functional                      case.                                                                                                                                                                    a domain expert is needed to set up scenarios. In many           solves general pointers, function pointers, and dynamic            we have c1 ≤ c2 , then c1 is called a subconcept of c2 and                                                                                                                                Fig. 6. Process for feature location in IDEF0 notation.                                                                                   but not necessarily directly applicable, because the focus        2. Identification of the relationships between scenarios and         Example. Figure 10 shows the concept lattice for the                   scenario s2 is based on scenario s1 because if s2 is executed,
stitute of Computer Science at the University of Stuttgart,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Our process to locate features is depicted in Fig. 6 using
Breitwiesenstrae 20–22, D-70565 Stuttgart, Germany.   E-mail:                               requirements. In the context of this paper, only functional                      Scenarios are used in our technique to gather the com-                                                                                                                cases, the domain expert can reuse existing test cases as        binding. These techniques vary in precision and costs.             c2 is called superconcept of c1 . For instance, in Fig. 4(b)                                                                                                                                                                                                                                                                        during testing is to cover the code completely and to com-        features and thus between features and computational             invocation table in Fig. 3, where all scenarios have been                 all computational units in the extent of µ(s1 ) need also to
{eisenbarth,simon,koschke}@informatik.uni-stuttgart.de.                                     features are relevant; that is, we consider a feature an ob-                  putational units for the relevant features through dynamic                                   Beyond simply identifying the computational units                           scenarios to locate features. However, the purpose of test       Interestingly enough, Milanova and others have recently            we have c4 ≤ c2 .                                                the IDEF0 notation [23]. It consists of five major activities:                                                                                                                                                                                                      bine features in many ways. Scenarios in our sense are very       units (process step 4.4)                                         selected.                                                  2              be executed. For instance, Move-circle and Color-circle




8


                                                                                                                                                                                                                                   f1        f2    f3     u1 u2 u3 u4                 u5     u6    u7

eanwh e the ed tor- n-ch ef was rep aced and the ru es had changed
                                                  scenarios                                                                                                                                                                                                                                                                                        cause u7 is also used in scenarios not invoking f1 at all.       and color-circle even if Draw-circle-diameter is not consid-       However, this gives us only a set of computational units,        E.2 Inspection of the Static Dependency Graph                     ful while navigating on the dependency graph:                                                                  ment, the individual switch branches would be more clearly                        mapped onto concepts in the superconcept along with pos-




he paper cons sted of ma n y two parts the theoret ca paper descr b ng
                                                                                                                                                                                                                            s1     ×               ×      ×                ×                 ×     ×
                                                                                                                                                                                                                                                                                                                                                   Cspc: u1 and u2 are executed only in scenarios invoking          ered. However, Draw-circle-diameter is useful to separate          but it is not clear which of these computational units are          Next, we inspect the executable static dependency graph        • Strongly connected component analysis is used to iden-                                                       assigned to the respective feature in the concept lattice.                        sible user annotations. Additionally, an incremental auto-
                                                                                                                                                                                                                            s2     ×         ×                 ×           ×          ×            ×
                                                                                                                                                                                                                            s3               ×      ×                ×                ×      ×     ×                                               f1 . They are less specific than u4 because they are not used     draw from setDiameter.                                     2       truly feature-specific and which of them are rather general-      (as one specific subset of the static dependency graph) that       tify cycles in the dependency graph: If there is one compu-                                                       In this section, we describe an incremental consideration                      matic graph layout can be chosen: Only additional nodes
                                                  source                                                                       execution
                                                                                                                                                                                                                                                    (a)Invocation relation I.                                                                      in all scenarios that invoke f1 ; that is, these computational      As a matter of fact, there could be several concepts for        purpose computational units used as building blocks for          contains all transitive control-flow successors and predeces-      tational unit in a cycle that contains feature-specific code,                                                   of attributes, namely, scenarios. Incremental consideration                       and e
                                                  code          compile for                           scenario                 profiles                                                                                                                                                                                                            units are only conditionally specific. Whether u1 and u2 are      which condition (10) holds when different computational             other computational units. Given a feature f of interest,        sors of computational units in Sstart (f ). We concentrate on     all computational units of the cycle are related to the fea-                                                   of objects—that is, refinement of computational units—is
                                                                 recording                           execution                                                                                     ({u1 , u2 , u3 , u4 , u5 , u6 , u7 }, ∅)                                                                (∅, ∅)                                  more or less specific than u7 is not decidable based on the       units are executed for the given feature, depending on the         this question can be answered as follows:                                                                                          ture because of the cyclic dependency.                                                                         analogous.
                                                                               3.1 executable                           3.2                                                                                                                                                                                                                                                                                                                                                                                                                             computational units here because they are the active con-
                                                                                                                                                                                                                                                                                                                        ({u2 }, {s2 })             concept lattice. On one hand, they are used in all scenarios     scenario contexts in which the feature is embedded. For            • As a first approximation, all computational units in the                                                                          • Dominance analysis is used to identify computational                                                            As soon as one understands the basics of a system, one
                                                                                                                                                                                                                                   ({u2 , u4 , u5 , u7 }, {s2 })                                                                                                                                                                                                                                                                                        stituents and because they were subject to the dynamic
                                                                                                                                                                                                                                                                                                                                                   invoking f1 and other scenarios, whereas u7 is also executed     instance, let us assume we are analyzing FIG’s undo ca-            extents of all feature-specific concepts for f jointly con-       analysis. The executable static dependency graph can be           units that are local to other computational units. A com-                                                      adds new scenarios for further detailed investigation and
                                                                    compiler




                                                                                                                                                                                                                                                                               Cspc                                                   Irlvt
                                                                                                             profiler




                                                                                                                                                                                                                                                                                                                                                   in scenarios that do not require f1 . On the other hand, u7      pabilities. Three scenarios can be provided to explore this        tribute to f .                                                   annotated with the features and scenarios for which the           putational unit u1 dominates another computational unit                                                        exploration of the unknown portions of the system. If one
                                                                                                      user




                                                                                                                                                                          ({u1 , u4 , u6 , u7 }, {s1 })                                    ({u3 , u5 , u6 , u7 }, {s3 })       ({u1 }, {s1 })                                ({u3 }, {s3 })        is executed whenever f1 is required, whereas u1 and u2 are       feature:                                                                                                                                                                                              u2 if every path in the dependency graph from its root to                                                      tries to capture all features of a software at once, the re-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       • The analyst refines this approximation by adding and re-        computational units were executed. If a computational
                                                                                                                                                                                                                                                                                                                                                   not executed in some scenarios that do require f1 .              • Draw a circle: {draw-circle}                                     moving computational units: By inspecting the static de-         unit is not annotated with any scenario, the computational        u2 contains u1 . In other words, u2 can be reached only                                                        sulting lattice may become too large, too detailed, and thus
                                                    Fig. 8. The process for the dynamic analysis in Fig. 6.                                                                                                                                                                Spec                                                         Shrd       Shrd: u5 and u6 are executed in scenarios invoking f1 but        • Undo circle drawing: {draw-circle, undo}                         pendency graph and the source code of the computational          unit was not executed. Non-executable parts of the system,        by way of u1 . If a computational unit u is found to be                                                        unmanageable. If one starts with a smaller set of scenarios
                                                                                                                                                                                  ({u4 , u7 }, {s1 , s2 })                              ({u5 , u7 }, {s2 , s3 })             ({u4 }, {s1 , s2 })                         ({u5 }, {s2 , s3 })
                                                                                                                                                                                                                                                                                                                                                   they are also executed in scenarios not invoking f1 ; that is,   • Undo without preceding drawing operation: {undo}                 units, she sorts out irrelevant computational units; she may     namely, declarative parts, may be added once all relevant         feature-specific, then all its dominators are also relevant                                                     and further increases this set, all accumulated knowledge
                                                                                                                                    basic                                                                                                                                                                                                          they are shared with other features. These computational            For the overlapping scenarios {draw-circle, undo} and           also add feature-relevant computational units that were not      computational units have been identified. A static points-         to the feature, because they need to be executed in order                                                      an analyst gained while working with the smaller lattice




the method and the eva uat ons w th case stud es
                                                                                                                                                                                                                                                                                      Rlvt                              ({u6 }, {s1 , s3 }})       units are presumably less relevant than u1 and u2 , which        {undo}, we may assume that different computational units                                                                                                                                               for u to be executed. If none of a dominator’s dominatees                                                      has to be preserved. The lattice—the mental map for the
                                                                                                                              interpretation                                                                                       ({u6 , u7 }, {s1 , s3 }})                                                                                                                                                                                                                           executed due to an incomplete input coverage of the sce-         to analysis is needed to resolve dynamic binding and calls
                                   incremental




                                                                                                                                                     4.3                                                                                                                                                                                           are executed only when f1 is invoked, and also less relevant     will be executed beyond those that are specific to com-             narios. The concept lattice is an important guidance for                                                                           contains feature-specific code and the dominator itself is                                                      analyst’s understanding—changes when new scenarios are
                                   analysis




                                                                                           concept                                                                                                           ({u7 }, {s1 , s2 , s3 })                                                        ({u7 }, {s1 , s2 , s3 })                                                                                                                                                                                                                                   via routine pointers if present. The static points-to anal-
                                                                                                                                                                                                                                                                                                                                                   than u7 , which is executed in all scenarios invoking f1 .       mand draw-circle: Quite likely, additional computational           the analyst’s inspection of the dependency graph.                                                                                  not feature-specific, then the dominator is a clear cutting                                                     added. Fortunately, the smaller lattice can be mapped to
                                                                                                                                           analyst




                                                                                           lattice                                                                                                                                                                                                                                                                                                                                                                                                                                                      ysis may take advantage of the knowledge about actually
                                                                                                                                                                                            (b)Concept lattice for context in Fig. 11(a)                              (c)Sparse concept lattice of Fig. 11(b) categorized with respect             Irlvt: u3 is irrelevant to f1 because u3 is executed only        units will be executed to handle the erroneous attempt to                                                                           executed computational units yielded by the dynamic anal-         point as all its dominatees are local to it. Consequently,                                                     the larger one (the smaller lattice is the result of a so-called
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Example. For FIG’s ability to color a circle, the ana-
                    execution                                 invocation
                                                                                                                                                                                                                                                                      to feature f1 that has been exposed in scenarios s1 and s2 .                 in scenarios not containing f1 .                                 call undo without previous operation. Consequently, the                                                                             ysis.                                                             the dominator and all its dominatees can be omitted while                                                      subcontext).
                                                                                                                                senario                                                                                                                                                                                                                                                                                                                                                lyst will need to validate the set of computational units
                    profiles        scenario                  table                concept
                                                                                                                                feature                                                                                                                                                                                                               These facts are more obvious in the sparse representation     lattice will contain an own concept for {draw-circle, undo}                                                                            We primarily consider only those computational units ui        understanding the system.                                                                                      Definition. Let C = (O, A, I) a context, O ⊆ O, and
                                    selection                                      analysis                                                                                                                                                Fig. 11. Categorizing concept lattices.                                                                                                                                                                                                     {color, setDiameter, draw} according to the concept lat-
                                                      4.1                                     4.2                               mapping 4.4                feature−                                                                                                                                                                                of the lattice. Using this representation, given a feature       and another one for {undo}, where the latter is not a sub-                                                                          for which ui ∈ extent(cf ) holds because only those com-             If more than one feature is relevant, one simply unites                                                     A ⊆ A. Then C = (O , A , I ∩ (O × A )) is called a sub-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       tice in Fig. 10. The lattice shows that the analyst should
                                                                                                                                                           unit map
                                                                                                                                                                                                                                                                                                                                                   f , one identifies the concept, cf , for which the following      concept of the former. The infimum of these two scenarios                                                                            putational units are actually executed when f is invoked          the starting sets for each feature and then follows the same                                                   context of C and C is called a supercontext of C . 2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       start with inspecting color because this appears as the most
                                                                                                                                       analyst




                                                                                                                                                                          is still a simple way to identify computational units rele-                               feature, the computational units particularly relevant to f1                                                                                    will contain the computational units of the undo opera-                                                                                                                                               approach. For more than one feature, the concept lattice
                                                                                analysis
                                                                                concept




                                                                                                                                                                                                                                                                                                                                                   condition holds:                                                                                                                                                                                     according to the dynamic analysis. Hence, we combine
                                        analyst




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       specific computational unit for coloring a circle.         2
                                                                                                                                                                          vant to the actual features in the concept lattice, although                              are u4 and u7 .                                                                                                                                 tion executed for normal as well as exceptional execution,                                                                                                                                            identifies computational units jointly and distinctly used                                                         In our application of concept analysis, we add only new
                                                                                tool




static and dynamic information to eliminate conditional
                                                                                                                                                                          an unambiguous identification may require additional dis-                                     We notice that u7 is also used in all other scenarios, so                                                                                    whereas the concept representing {undo} contains the com-                                                                           static computational units executions in order to reduce          by those features.                                                                                             rows (one for each new scenario, assuming that scenarios
                                                                                                                                                                                                                                                                                                                                                                   cf = (U, S) and             sj = {f }    (10)                                                                       E.1 Building the Starting Set
                                  Fig. 9. The process for interpretation of the concept lattice in Fig. 6.                                                                criminating scenarios. The basic idea is to isolate features                              that one cannot consider u7 a specific computational unit                                                                                        putational units for error handling.                                                                                                the search space. Nevertheless, one should check for the             Once all relevant computational units have been identi-                                                     occur in rows of the relation table) but never new columns
                                                                                                                                                                                                                                                                                                                                                                                       sj ∈S
                                                                                                                                                                          in the concept lattice through combinations of overlapping                                for any of f1 , f2 , or f3 . Computational unit u4 , in con-                                                                                       In case of multiple concepts for which condition (10)              All computational units in the extent of a concept jointly    reasons why certain computational units have not been ex-         fied, other static (e.g., program slicing) as well as dynamic                                                   to the relation table (because we statically know all com-




e had to cut the paper and pub shed on y the theoret ca part
                                                                                                                                                                          scenarios.                                                                                trast, is used only in scenarios executing f1 . We therefore                     Concept cf is called a feature-specific concept for f .         holds, we can unite the computational units that are in            contribute to all features in the intent of the concept, which   ecuted.                                                           analyses (e.g., trace recording to obtain the order of execu-                                                  putational units in advance). Adding new rows leads to a
                                                                                                                                                                             If a scenario invokes several features, one can formally                               state the hypothesis that u4 is specific to f1 whereas u7                       Based on the feature-specific concept, one can categorize         Spec with respect to these concepts. If the identified con-         immediately follows from the definition of a concept. How-           Any kind of traversal of the executable static dependency      tion) can be applied to obtain further information. These                                                      new formal context (U, S , I ) in which relation I extends
                                                                                                                                                                          model a scenario as a set of features s = {f1 , f2 , . . . , fm },                        is not. Because there is no other scenario containing f1                       the computational units as follows:                              cepts are in a subconcept relation to each other, the su-          ever, there may also be computational units in the extent        graph is possible, but a depth-first search along the control-     analyses can be performed more goal-oriented by leveraging                                                     relation I.
                      Move−circle                    move Color−circle                 color Draw−circle−radius                                        setRadius          where fn ∈ F for 1 ≤ n ≤ m (F is the set of all relevant                                  other than s1 and s2 , computational unit u4 is the only                       Spec: all computational units u for which γ(u) = c holds.        perconcept represents a strict extension of the behavior of        that contribute to other features as well, so that they are      flow is most suited because a computational unit can be            the retrieved feature-unit map.
                                                                                                                                                                          features). This modeling is simplifying because it abstracts                              computational unit specific to f1 .                                             Rlvt: all computational units u for which γ(u) = c and           the feature. If the concepts are incomparable, these con-          not specific to the given feature. There may be computa-          understood only if all its executed computational units are                                                                                                                      Proposition. Let C = (O, A, I) and C = (O, A , I ),
                                                                                                                                                                          from the exact order and frequency of feature invocations                                    Note that this is just a hypothesis because other features                  c < c holds.                                                     cepts represent varying context-dependent behavior of the          tional units in the extent that do not contain any feature-                                                                        F. Incremental Analysis                                                                                        where A ⊆ A and I = (I ∩ (O × A )). Then every extent
understood. In a breadth-first search, a human would have
                                Draw−circle−diameter                            setDiameter                                                                               in a scenario. On the other hand, if the order or frequency                               might be involved to which u4 is truly specific and that are                    Cspc: all computational units u for which γ(u) = c and           feature.                                                           specific code at all. Thus, computational units in the ex-                                                                                                                                                                                         of C is an extent of C.                            2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        to cope with continuous context switches. The goal of the            There are at least two reasons why an incremental con-
                                                                                                                                                                          of feature invocations do count, the scenarios may indeed                                 not explicitly listed in the scenarios. Another explanation                    c < c holds.                                                        If there is no concept for which condition (10) holds,          tent of the concept need to be inspected manually. Because       inspection is to sort out computational units that do not         sideration of scenarios is desirable. First, one might not                                                     Proof. See [22].                                   2
                                                                                                                                                                          be considered complex features in their own right. If these                               could be that, by accident, u4 is executed both for f2 (in                     Shrd: all computational units u for which u is in the in-        one needs additional scenarios that factor out feature f .         there are no reliable criteria known that automatically dis-     belong to the feature in a narrow sense because they do           get the suite of scenarios sufficiently discriminating the first
                                                                                       draw                                                                               scenarios yield different execution profiles, they will appear                              s2 ) and f3 (in s1 ); then, it appears in both scenarios but                   tent of concept c where c < c holds and c and γ(u) are           For instance, in order to isolate feature f1 in scenario           tinguish feature-specific code from general-purpose code,                                                                                                                                                                                             According to this proposition, each extent within the
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        not contain feature-specific code.                                 time. New scenarios become necessary to further differenti-
                                                                                                                                                                          in different concepts in the lattice and their commonalities                               nevertheless is not specific to f1 . However, chances are high                  incomparable.                                                    s1 = {f1 , f3 }, one can simply add a new scenario s2 =            this analysis cannot be automated and human expertise is            The executable static dependency graph rather than the         ate scenarios into features. Second, new scenarios are useful                                                  subcontext will show up in the supercontext. This can
                                                              Fig. 10. Sparse concept lattice for Fig. 3.
                                                                                                                                                                          and differences are revealed and may be analyzed.                                          that u4 is specific to f1 because u4 is not executed when f2                    Irlvt: all other computational units not categorized by          {f1 , f2 }. The computational units specific to f1 will be in       necessary. However, the concept lattice may narrow the           concept lattice is traversed for inspection because the lat-      when trying to understand an unfamiliar system incremen-                                                       be made plausible with the relation table: Added rows
                                                                                                                                                                             With the domain expert’s additional knowledge of which                                 and f3 are jointly invoked in s3 , which suggests that u4 at                   other categories.                                                µ(s1 ) µ(s2 ).                                                     candidates for manual inspection.                                tice does not really reflect the control-flow dependencies:         tally. One starts with a small set of relevant scenarios to                                                    will never change existing rows, so the maximal rectan-




oday the day has come to present the case study we conducted for
                                                                                                                                                                          features are invoked by a scenario we can identify the com-                               least comes into play only when f1 interacts with f2 or f3 .                     When the distance between c and c is considered, there            It is not necessary to consider all possible feature com-          The concept lattice and the dependency graph can help         γ(u1 ) > γ(u2 ) does not imply that u1 is a control-flow pre-      locate and understand a fundamental set of features by                                                         gles forming concepts will extend only in vertical direction
are based on Draw-circle-diameter according to Fig. 10.                                     D.4 Scenario Feature Mappingif scenarios are listed in rows).
                                                                                                                                                                          putational units relevant to a certain feature. Let us con-                               At any rate, the categorization is hypothetic and needs to                     are additional nuances within categories Rlvt, Cspc,             binations in order to isolate features in the lattice. Inter-      to decide in which order the computational units are to be       decessor of u2 . However, the concept lattice may still pro-      providing a small and manageable overview lattice. Then,
                                                                                               The interpretation of the concept lattice as described                     sider the invocation relation I in Fig. 11(a) (for better leg-                            be validated by the analyst.                                                   and Shrd possible. The distance measures the size of the         secting all currently available scenarios exactly tells which      inspected such that the effort for manual inspection can be       vide useful information for the inspection. In Section IV-D,      one successively increments the set of considered scenarios                                                       This proposition on the invariability of extents of sub-
                                                                                            above gives insights into the relationship between scenarios                  ibility, scenarios are listed as rows and computational units                                Computational units that are somehow related to but                         set of features a computational unit is potentially relevant     features are not yet isolated (the intersection could be done      reduced to a minimum. Since we are interested in com-            we made the observation that the lower a concept γ(u) is          to widen the understanding.                                                                                    contexts that differ only in the set of objects results in
                                                                                            S and computational units U . However, the analyst is                         as listed as columns). The table contains the called compu-                               not specific for f1 are such computational units that are                       for. The larger the set, the less specific the computational      by concept analysis applied to the formal context consist-         putational units most specific to a feature f , we start at       in the lattice, the more general computational unit u is be-         Adding scenarios means adding attributes to the formal                                                      a simple mapping of concepts from the subcontext to the
   Thus the lattice also reflects the level of application                                   primarily interested in the relationship between features F                   tational units u1 , . . . , u7 per scenario, and furthermore the                          executed for scenarios invoking f1 amongst other features.                     unit is.                                                         ing of scenarios and features, where the incidence rela-           those computational units ui that are attached to a feature-     cause it serves more features—and vice versa. Thus, the           context; but there are also situations in which objects are                                                    supercontext (for a formal proof see [22]):
specificity of computational units. The information de-                                      and computational units U . This section describes how to                     invoked features per scenario: s1 = {f1 , f3 }, s2 = {f1 , f2 },                          In our example, both s1 and s2 invoke f1 . Computational                         Example. The scenario Move-circle in Fig. 2 invokes            tion describes which feature is invoked by which scenario).        specific concept of f , that is, for which cf = γ(ui ) holds,     concept lattice gives us insight into the level of abstrac-       added incrementally: in cases where computational units
scribed above can be derived by a tool and fed back to                                      identify this relationship in the concept lattice if there is no              and s3 = {f2 , f3 }. The corresponding concept lattice for                                units in extents of concepts which contain s1 or s2 are there-                 two features: the ability of FIG to draw a circle by di-         Slightly modified variants of scenarios invoking the feature        where cf is a feature-specific concept for f . If there are       tion of a computational unit and, therefore, contributes to       need to be refined. For instance, computational units with                                                                                      (U, S) → (U, σ(U ))
the analyst. Inspecting the relationships derived from the                                  one-to-one correspondence between scenarios and features.                     the invocation relation in Fig. 11(a) is shown in Fig. 11(b).                             fore potentially relevant to f1 . In our example, u1 , u2 , u5 ,               ameter and the ability to move this circle. The scenario         can be added to isolate the feature specifically.                   no such computational units, we collect all computational        the degree of confidence that a specific computational unit         low cohesion—that is, computational units with multiple,                                                          The mapping is a -preserving embedding, meaning that
concept lattice, a decision may be made to analyze only a                                      Because one feature can be invoked by many scenarios                       The feature part of the table is ignored while constructing                               and u6 are potentially relevant in addition to u4 and u7 .                     Color-circle also uses the ability to draw a circle; yet, it        The addition of new scenarios in order to discriminate          units below any of the feature-specific concepts cf of f with     contains feature-specific code.                                    yet different functions—will “sink” in the concept lattice if                                                   the partial order relationship is completely preserved. Con-
subset of the original features in depth due to the additional                              and one scenario can invoke several features, there is not                    this lattice.                                                                             Computational unit u3 is executed only for scenario s3 ,                       colors the circle instead of moving it. Hence, the compu-        features in the lattice will lead us to an incremental con-        minimal distance to cf in the sparse representation. There          Example. The analyst would first validate the starting          they contribute to many features. A routine containing a                                                       sequently, the supercontext is basically a refinement of the
dependencies that concept analysis reveals. All computa-                                    always a strict correspondence between features and sce-                         Computational units specific to feature f1 can be found                                 which does not contain f1 .                                                    tational units responsible for drawing a circle are attached     struction of the concept lattice described in Sect. IV-F.          can be more than one concept cf , so we unite all computa-       set for FIG’s ability to color a circle Sstart (color-circle) =   very large switch statement where only one branch is actu-                                                     subcontext. By this mapping all concepts of the subcontext
tional units required for these features (easily derived from                               narios. For instance, as discussed above, the scenarios                       in the intersection of the executed computational units of                                   Altogether, we can identify five categories for computa-                     to the concept in Fig. 10 that represents the intersection of    Before we come to that, we describe the static dependency          tional units that are attached to one of these concepts. The     {color}. Then she would inspect the control-flow predeces-         ally executed for each feature is a typical example. If the                                                    can be found in the supercontext.




fter 9 years fina y!
the concept lattice) form a starting point for further static                               Move-circle and Color-circle of FIG are based on Draw-                        the two scenarios s1 and s2 because f1 is invoked for s1 and                              tional units with regard to feature f1 (see Fig. 11(c)):                       the features invoked by Move-circle and Color-circle. The        analysis.                                                          subset of computational units identified in this step that is     sors and successors of color. Some of them might not be           analyst encounters such a routine during static analysis,                                                         The supercontext may include new concepts not found
analyses to validate the identified computational units and                                  circle-diameter according to Fig. 10 because in order to                      s2 . The intersection of the computational units executed                                 Spec: u4 is specific to f1 because it is used in all scenarios                  scenario Draw-circle-diameter would not necessarily have                                                                            accepted after manual inspection is called the starting set      executed, yet a brief check is still necessary to make sure       she could lower the level of granularity for computational                                                     in the subcontext. The consequence for the visualization
to identify further computational units that were possibly                                  move or color a shape, one has to draw it first. The sce-                      for s1 and s2 can be identified as the extent of the infimum                                invoking f1 but not in other scenarios.                                        been required to identify the computational units for draw-      E. Static Dependency Analysis                                      Sstart (f ).                                                     that they are indeed irrelevant. Then, she would continue         units specifically for this routine to basic blocks. Basic                                                      of the supercontext is that the newly introduced concepts
not executed during dynamic analysis because of limita-                                     nario for moving or coloring a shape will thus necessarily                    of the concepts associated with s1 and s2 : µ(s1 ) µ(s2 ) =                               Rlvt: u7 is relevant to f1 because u7 is used in all sce-                      ing a circle by diameter: The sparse lattice reveals these         From the concept lattice, we can easily derive all com-             Example. The starting set for FIG’s ability to color a        with setDiameter and eventually inspect draw.                2    blocks as computational units disentangle the interleaved                                                      can be highlighted easily in the visualized lattice of the
tions in the design of the scenarios.                                                       invoke the feature which draws a shape. Fortunately, there                    ({s1 , s2 }, {u4 , u7 }). Since s1 and s2 do not share any other                          narios invoking f1 ; but it is also more general than u4 be-                   computational units as the direct infimum of Move-circle          putational units executed for any set of relevant features.        circle, Sstart (color-circle), is {color}.                  2       Two additional analyses gather further information use-        code: For the example routine with the large switch state-                                                     supercontext and that concepts in the subcontext can be




omet mes you get a second chance n fe
The problem we were trying to solve in the paper can be explained very
simply with an example.

  • This is a screenshot of the drawing tool XFig. It allows you to draw
    graphical objects such as circles, rectangles, and text.
  • Suppose, you were a developer and assigned to extend XFig. For instance,
    your task is to add triangles.
  • As you know, XFig has been developed by someone else, not you.
  • Likely, you would first like to understand how it works for drawing the
    existing objects.
  • The very first problem to do that is to locate the code that implements
    these features.
Here is the call graph of XFig. Now, where would you start?
This problem is known as feature location. Another term used is concept
                                location.
                                Feature location answers the question “Where does this program do X?”,
                                as Norman Wilde phrased it back in 1992.
                                Norman Wilde is a pioneer in feature location. He received the
                                most-influential paper award for ICSM 1992 for his work on feature
                                location.
Where does this                 This year’s best-paper award went to a feature location paper. too.
                                There seems to be a tradition for most-influential papers related to

program do X?                   feature location at ICSM.


         — Norman Wilde, 1994
His technique works as follows.
                                 Because the technique is based on dynamic information, you need to
                                 compile your program first.
source   compiler   executable
 code
Then, you run the program invoking the relevant feature X and record
                                                 every piece of code that was executed.
                                                 All that code is relevant for the feature. But it may also be executed
 source          compiler        executable      when other features are executed, thus it may contain code not really
  code
                                                 specific to the feature of interest, for instance, the main function.

invoke with feature      trace        profiler




                 invoking
                 input set I
For this reason, the program is executed once more. This time without
                                                 invoking the feature of interest. This gives you all code that is executed
                                                 when the feature is not used.
 source          compiler        executable
  code


invoke with feature      trace        profiler

invoke w/o feature       trace        profiler



                 invoking
                 input set I

                 excluding
                 input set E
Now we have two sets: the code executed for the feature of interest, and
                                                                        code that is executed even though the feature was not invoked.
                                                                        We can determine the difference between these two sets, which gives us
 source          compiler        executable                             the code that is more specific to the feature of interest.
  code


invoke with feature      trace        profiler

invoke w/o feature       trace        profiler



                 invoking
                 input set I                             starting set
                                   difference I−E         for static
                 excluding                                analysis
                 input set E


                                                 — Wilde et al. 1992
Here is a simple example.
                                         Let this be the dynamic call graph of XFIG when the feature was
       with feature                      executed.




                             set centc
       draw       draw arc
main
Then we execute the program once more, this time without the feature.
                                            We obtain this other red call graph, which overlaps with the other one.
       with feature    without feature


 load
                                set centc
       draw       draw arc
                                set cente
main
We compute the difference between them and detect the routine
                                            set centc as routine that was executed only for the feature of interest.
       with feature    without feature      Problems of dynamic analysis
                                              • Results depend upon input and are, thus, incomplete
                                              • Set difference is binary: an element is either in the set or not
 load
                                                    – some of the code in the excluding input set may still be somewhat
                                                      relevant to the feature
                                set centc
       draw       draw arc
                                set cente
main
An alternative approach was proposed by Vaclav Rajlich, another pioneer
in concept location.
Vaclav proposed a static technique. Here, the idea is to extract a static
                                            dependency graph. The user browses the call graph and a tool supports
  call                          call        the navigation, similar to a web browser.
  graph         call graph      graph
  extractor                     traversal   Problems:
                                              • Where to start?
 load            set text                     • Where to continue?

                             set centc        • When to stop?
       draw      draw arc                     • Static analysis is difficult.
                             set cente
main
                 set ru ll

              move
         save
Our technique combines these two ideas and generalizes from one feature
                                                                 of interest to multiple features.
                                                                 First, we run a dynamic analysis similar to Norman Wilde’s idea.


    invoke feature f1   trace         profiler   routines (f1)
    invoke feature f2   trace         profiler   routines (f2)




             compiler    executable



source
code
We are interested in many features and not only one. We want to
                                                                              understand what the difference is between drawing circles, rectangles,
                                                                              and text, for instance.
                                                                              For this reason, we execute the program more than once. At least once
                                                                              for each feature of interest.
    invoke feature f1   trace         profiler   routines (f1)                This gives us an invocation table. Each column in that table contains the
                                                                 invocation   code that was executed.
    invoke feature f2   trace         profiler   routines (f2)     table
          ...            ...            ...         ...                       Since we have many such columns, a simple set difference does no longer
    invoke feature fn   trace         profiler   routines (fn)                suffice.
                                                                              Instead, we use formal concept analysis.
                                                                  concept     I will describe formal concept analysis shortly.
             compiler    executable                               analysis


source                                                            concept
code                                                               lattice
The information we obtain from formal concept analysis is then used to
                                                                               help navigating the static call graph.
                                                                               It tells us where to start, where to continue and where to stop.
                                                                               I will describe all these step with an example.

    invoke feature f1   trace         profiler    routines (f1)
                                                                  invocation
    invoke feature f2   trace         profiler    routines (f2)     table
          ...            ...            ...           ...
    invoke feature fn   trace         profiler    routines (fn)


                                                                   concept
             compiler    executable                                analysis


            call                                 call
source                                                             concept
            graph        call graph              graph
code                                                                lattice
            extractor                            traversal
In our example of XFig, we are interested in its capabilities of drawing
                 different graphical objects.
                 For each such object, we prepare one usage scenario or test case.
                 Each tries to execute only one feature of interest and as few other
                 features as possible.
                 For instance, we prepare four test cases or usage scenarios:

Scenarios          • draw an ellipsis
draw Ellipsis      • draw a circle
draw Circle        • draw a rectangle
draw Rectangle
draw Text          • draw a text
Here is the result of the dynamic analysis: the invocation table.
                                                                   Each column describes the set of routines executed for the respective
      Invocation Table                                             feature.
                                                                   In Norman Wilde’s approach, we would have two columns. Here we have
                                                                   many.
                                                                   Consequently, a simple binary set difference is no longer possible.
            drawEllipsis   drawCircle   drawRectangle   drawText   Instead, we are using formal concept analysis.
main            ×              ×             ×             ×
draw            ×              ×             ×             ×
draw arc        ×              ×
set centc                      ×
set cente        ×
set ru ll                                    ×
set text                                                   ×
Formal concept analysis is a mathematical technique to analyze binary
                                                                                      relations. An invocation table is such a binary relation.
                         Invocation Table                                             Of course, formal concept analysis can analyze arbitrary binary relations.
                                                                                      It is based on :

                                                                                        • a set of objects → routines
                               drawEllipsis   drawCircle   drawRectangle   drawText
set of objects O




                                                                                        • a set of attributes → feature scenarios or test cases
                   main            ×              ×             ×             ×
                   draw            ×              ×             ×             ×         • a binary relation between these objects and attributes; it describes which
                                                                                          object possesses which attributes → invocation table
                   draw arc        ×              ×
                   set centc                      ×
                   set cente        ×
                   set ru ll                                    ×
                   set text                                                   ×
Formal concept analysis is a mathematical technique to analyze binary
                                                                                    relations. An invocation table is such a binary relation.
                         Invocation Table                                           Of course, formal concept analysis can analyze arbitrary binary relations.
                                                                                    It is based on :
                                                set of attributes A                   • a set of objects → routines
                               drawEllipsis   drawCircle drawRectangle   drawText
set of objects O




                                                                                      • a set of attributes → feature scenarios or test cases
                   main            ×              ×             ×           ×
                   draw            ×              ×             ×           ×         • a binary relation between these objects and attributes; it describes which
                                                                                        object possesses which attributes → invocation table
                   draw arc        ×              ×
                   set centc                      ×
                   set cente        ×
                   set ru ll                                  ×
                   set text                                                 ×
Formal concept analysis is a mathematical technique to analyze binary
                                                                                    relations. An invocation table is such a binary relation.
                         Invocation Table = relation R ⊆ O × A                      Of course, formal concept analysis can analyze arbitrary binary relations.
                                                                                    It is based on :
                                                set of attributes A                   • a set of objects → routines
                               drawEllipsis   drawCircle drawRectangle   drawText
set of objects O




                                                                                      • a set of attributes → feature scenarios or test cases
                   main            ×              ×             ×           ×
                   draw            ×              ×             ×           ×         • a binary relation between these objects and attributes; it describes which
                                                                                        object possesses which attributes → invocation table
                   draw arc        ×              ×
                   set centc                      ×
                   set cente        ×
                   set ru ll                                  ×
                   set text                                                 ×
Given the relation, you can define a function that yields the set of
                                                                                     common attributes for a given set of objects.
                          Invocation Table = relation R ⊆ O × A                      For instance, the common attributes for main, draw, and draw arc are
                                                                                     drawEllipsis and drawCircle.
                                                                                     You can spot that in the table by the completely filled rectangle.
                                                 set of attributes A
                                drawEllipsis   drawCircle drawRectangle   drawText
 set of objects O




                    main            ×              ×             ×           ×
                    draw            ×              ×             ×           ×
                    draw arc        ×              ×
                    set centc                      ×
                    set cente        ×
                    set ru ll                                  ×
                    set text                                                 ×

common attributes for O ⊆ O

                           σ(O) := {a ∈ A | (o, a) ∈ R ∀o ∈ O}
Given the relation, you can define a function that yields the set of
                                                                                     common attributes for a given set of objects.
                          Invocation Table = relation R ⊆ O × A                      For instance, the common attributes for main, draw, and draw arc are
                                                                                     drawEllipsis and drawCircle.
                                                                                     You can spot that in the table by the completely filled rectangle.
                                                 set of attributes A
                                drawEllipsis   drawCircle drawRectangle   drawText
 set of objects O




                    main            ×              ×             ×           ×
                    draw            ×              ×             ×           ×
                    draw arc        ×              ×
                    set centc                      ×
                    set cente        ×
                    set ru ll                                  ×
                    set text                                                 ×

common attributes for O ⊆ O

                           σ(O) := {a ∈ A | (o, a) ∈ R ∀o ∈ O}
Analogously, you can define a function that yields all objects that have a
                                                                                     given set of attributes.
                          Invocation Table = relation R ⊆ O × A                      In this example, the common objects for drawEllipsis and drawCircle are
                                                                                     main, draw, and draw arc.
                                                 set of attributes A
                                drawEllipsis   drawCircle drawRectangle   drawText
 set of objects O




                    main            ×              ×             ×           ×
                    draw            ×              ×             ×           ×
                    draw arc        ×              ×
                    set centc                      ×
                    set cente        ×
                    set ru ll                                  ×
                    set text                                                 ×

common objects for A ⊆ A

                            τ (A) := {o ∈ O    | (o, a) ∈ R ∀a ∈ A}
Analogously, you can define a function that yields all objects that have a
                                                                                     given set of attributes.
                          Invocation Table = relation R ⊆ O × A                      In this example, the common objects for drawEllipsis and drawCircle are
                                                                                     main, draw, and draw arc.
                                                 set of attributes A
                                drawEllipsis   drawCircle drawRectangle   drawText
 set of objects O




                    main            ×              ×             ×           ×
                    draw            ×              ×             ×           ×
                    draw arc        ×              ×
                    set centc                      ×
                    set cente        ×
                    set ru ll                                  ×
                    set text                                                 ×

common objects for A ⊆ A

                            τ (A) := {o ∈ O    | (o, a) ∈ R ∀a ∈ A}
Given these two functions, you can define a formal concept. It is defined
                                                                                     as a pair of objects and attributes where all objects have all these
                          Invocation Table = relation R ⊆ O × A                      attributes and vice versa.
                                                                                     For example, main, draw, and draw arc together with drawEllipsis and
                                                                                     drawCircle are a formal concept.
                                                 set of attributes A
                                drawEllipsis   drawCircle drawRectangle   drawText
 set of objects O




                    main            ×              ×             ×           ×
                    draw            ×              ×             ×           ×
                    draw arc        ×              ×
                    set centc                      ×
                    set cente        ×
                    set ru ll                                  ×
                    set text                                                 ×

formal concept c = (O, A)

                                      A = σ(O) ∧ O = τ (A)
Given these two functions, you can define a formal concept. It is defined
                                                                                     as a pair of objects and attributes where all objects have all these
                          Invocation Table = relation R ⊆ O × A                      attributes and vice versa.
                                                                                     For example, main, draw, and draw arc together with drawEllipsis and
                                                                                     drawCircle are a formal concept.
                                                 set of attributes A
                                drawEllipsis   drawCircle drawRectangle   drawText
 set of objects O




                    main            ×              ×             ×           ×
                    draw            ×              ×             ×           ×
                    draw arc        ×              ×
                    set centc                      ×
                    set cente        ×
                    set ru ll                                  ×
                    set text                                                 ×

formal concept c = (O, A)

                                      A = σ(O) ∧ O = τ (A)
Intuitively, you are searching for maximally large filled rectangles in this
                                                                   table, where you can permute rows and columns.
            drawEllipsis   drawCircle   drawRectangle   drawText
main            ×              ×             ×             ×
draw            ×              ×             ×             ×
draw arc        ×              ×
set centc                      ×
set cente        ×
set ru ll                                    ×
set text                                                   ×
The set of all concepts in this table is listed here.
                                                                                  Now, let us pick two of these concepts and look closer.
                   drawEllipsis   drawCircle   drawRectangle   drawText
     main              ×              ×             ×             ×
     draw              ×              ×             ×             ×
     draw arc          ×              ×
     set centc                        ×
     set cente          ×
     set ru ll                                       ×
     set text                                                      ×

c1   =   ({main, draw},{drawEllipsis,drawCircle, drawText, drawRectangle })
c2   =   ({draw arc, main, draw}, {drawEllipsis,drawCircle})
c3   =   ({set cente, draw arc, main, draw}, {drawEllipsis})
c4   =   ({set centc, draw arc, main, draw}, {drawCircle})
c5   =   ({set text, main, draw},{drawText})
c6   =   ({set ru ll, main, draw},{drawRectangle})
c7   =   ({set ru ll, set text, set centc, set cente, draw arc, main, draw}, ∅)
For instance, we pick these two concepts.
                                                      We see that the objects of the first one are a subset of the objects of the
({draw arc, main, draw}, {drawEllipsis,drawCircle})   second one.
({set centc, draw arc, main, draw}, {drawCircle})     Likewise, the attributes of the second one are a subset of the attributes
                                                      of the first one.
                                                      The second one has fewer attributes. Consequently, there are more
                                                      objects having these attributes.
                                                      If you think of a concept as a class in an object-oriented programming
                                                      language, this observation would be expressed as a superclass / subclass
                                                      relation.
                                                      The first concept has all attributes of the second one plus additional
                                                      attributes.
This allows us to define an ordering between concepts.
                                                       This ordering is analogous to subclassing.
({draw arc, main, draw}, {drawEllipsis,drawCircle})    A concept c1 is smaller than concept c2 if all objects of c1 are contained
({set centc, draw arc, main, draw}, {drawCircle})      in c2 or, dually, if all attributes of c2 are in c1 .


Let c1 = (O1 , A1 ) and c2 = (O2 , A2 ) be concepts;

                            c1 ≤ c2 :⇔ O1 ⊆ O2

or dually
                            c1 ≤ c2 :⇔ A2 ⊆ A1
In that case, c2 is called a superconcept of c1 and c1 is a subconcept of
                                                       c2 .
({draw arc, main, draw}, {drawEllipsis,drawCircle})
({set centc, draw arc, main, draw}, {drawCircle})


Let c1 = (O1 , A1 ) and c2 = (O2 , A2 ) be concepts;

                            c1 ≤ c2 :⇔ O1 ⊆ O2

or dually
                            c1 ≤ c2 :⇔ A2 ⊆ A1


     c2 is superconcept of c1
     c1 is subconcept of c2
This partial order forms a lattice, called concept lattice.
                                                       Lattices can be visualized with Hasse diagrams.
({draw arc, main, draw}, {drawEllipsis,drawCircle})
({set centc, draw arc, main, draw}, {drawCircle})


Let c1 = (O1 , A1 ) and c2 = (O2 , A2 ) be concepts;

                            c1 ≤ c2 :⇔ O1 ⊆ O2

or dually
                            c1 ≤ c2 :⇔ A2 ⊆ A1


     c2 is superconcept of c1
     c1 is subconcept of c2


⇒   lattice
Here, we see the Hasse diagram of our example.
                Hasse diagramm                                          The nodes are the concepts. The edges are the partial order. Where the
                                                                        edge is directed from bottom to top by convention. That is,
                                              set ru ll                 superconcepts are at the top, subconcepts are below.
                    draw Circle               set text                  We see the attributes in blue. In our case, these are our features of
                    set centc                 set centc
                    draw arc                  set cente                 interest.
                    main                      draw arc                  We see objects, which are the routines executed for these features.
                    draw              1       main
                                              draw                      There are two special concepts, namely, the top and the bottom element.
draw Ellipsis                                                           The top element consists of all objects and their attributes.
set cente                                      draw Rectangle           The bottom element consists of all attributes and the objects that
draw arc        2         3       4       5
main                                           set ru ll                possess all these attributes.
draw                                           main                     By the definition of the ordering of concepts, every superconcept has all
                                               draw
                                                                        attributes of its subconcepts.
draw Ellipsis      6                           draw Text                Likewise, every subconcept has all objects of its superconcepts.
draw Circle                                                             That is, there is a lot of redundancy in this Hasse diagram.
                                               set text
          draw arc                             main
          main                                 draw
          draw
                                                     draw   Ellipsis
                                              main   draw   Circle
                                      0       draw   draw   Text
                                                     draw   Rectangle
The sparse Hasse diagram avoids this redundancy.
                Sparse Hasse diagramm                             Each object and every attribute is listed only once, where they appear
                                                                  first in the lattice.
                                                                  By the definition of the ordering of concepts, we can infer where they
                      draw Circle                                 also appear in the lattice.
                      set centc
                                                                  The sparse representation is much more readable.
                                        1
draw Ellipsis
set cente                                        draw Rectangle
                  2         3       4       5    set ru ll


                       6                         draw Text
                                                 set text
          draw arc


                                                main
                                        0       draw
We can use the sparse Hasse diagram in combination with the static call
                                                                                                          graph as follows.
                                                                                                          If we want to know what are the specific routines for a given feature, we
                                                                                                          simply look for that feature in the lattice.
                     draw Circle
                     set centc
                                                                                                          Let us assume, we are interested in the feature drawCircle.
                                                                                                          If the concept at which this feature occurs has no other feature, all
                                       1
                                                                                                          routines listed at this concept are specific to this feature.
draw Ellipsis
set cente                                       draw Rectangle
                                                                  load            set text                In our example, we would start browsing the call graph at routine
                2          3       4       5    set ru ll                                     set centc   set centc.
                                                                        draw      draw arc                This information could as well have been by simple set difference
                                                                                              set cente
                      6                         draw Text
                                                                 main                                     operations.
                                                set text                          set ru ll               But the lattice provides more information. If we look at the subconcept
          draw arc
                                                                               move                       of concept 3, we find a concept annotated with routine draw arc. This
                                                                         save
                                                                                                          routine also contributes to feature drawEllipsis because 3 is also a
                                               main
                                               draw                                                       subconcept of concept 2.
                                       0
                                                                                                          Thus draw arc serves two features. It is also required for feature
                                                                                                          drawEllipsis, but it is less specific than set centc.
                                                                                                          Yet, it is more specific than main and draw which are listed as transitive
                                                                                                          subconcepts of concept 3. While set difference is binary, the lattice gives
                                                                                                          a finer ranking of feature specificity.
                                                                                                          That is, you would continue your navigation of the static call graph at
                                                                                                          draw arc, and then look also at main and draw.
In later work, we extended this approach to handle cases in which there is
                                                                                                          no one-to-one mapping between features and scenarios. Furthermore, we
                                                                                                          used concept analysis incrementally so that you start with a small set of
                                                                                                          features and then extend it to a larger set without loosing your previous
                     draw Circle
                     set centc
                                                                                                          knowledge.
                                       1
draw Ellipsis
set cente                                                         load            set text
                                                draw Rectangle
                2          3       4       5    set ru ll                                     set centc
                                                                        draw      draw arc
                                                                                              set cente
                      6                                          main
                                                draw Text
                                                set text                          set ru ll
          draw arc
                                                                               move
                                                                         save
                                               main
                                       0       draw
Now we come to one case study that was not published in our TSE paper.
We tried this technique in an industrial case study on this machine here.
This machine is chip tester. It is used by chip manufacturers to check
whether a chip works correctly before it is shipped.
A robot puts the chip into that machine and this machine runs various
tests that can be programmed by a test engineer.
The software architecture of the firmware of this chip tester is sketched
                                                                        applications
                                                                                       here.
                                                                        firmware       The firmware provides the basic operations used by various applications,
                                                                                       that is by tools to implement, configure, run, analyze, and visualize tests.
                                         shared           message                      Because these applications run in parallel, the first layer of the
semaphor                                 memory            queue
                                                                                       architecture provides some synchronization and means to exchange input
                                                                                       and output.
                                                                                       There is a programming language for writing these tests. The input to
           YACC parser                            constructor                          the firmware are such programs. There are first parsed and then executed.
                                                                                       The firmware is written in C. And there is exactly one C function that
           command                                 response                            executes an operation in this programming language. These functions are
                                                                                       called executors.
                                                                                       The executors use some shared utility functions to execute the operation.
                                                                                       This architecture looks very tidy and structured. The truth is, however,
                                                                     data flow         that 90 % of the code is hidden in this box labeled utility functions.
exectuor


              exectuor


                              exectuor




                                                                                       Nobody had a clear picture of which executors shared which utility
                                                                    control flow
                                                                                       functions.


                         utility functions
                                                                        firmware

                                                                        hardware
Here is the static call graph of the firmware.
It consists of roughly 10,000 routines and is very complex.
Configuration Setup                                        We analyzed 76 different operations of this programming language.
                                                          Related operations can be grouped into categories. For instance, there
CNTR, CNTR?, CONF, CONF? UDEF, UDPS, UDGP
                                                          are operations for the configuration setup, relay control, and many more.
DPFN, DFPN?, DFPS, DFPS? DFGP, DFGP?, DFGE, DFGE?         Since the operations of one category are semantically related, we would
PALS, PALS?, PSTE, PSTE? PSFC, PSFC?, PQFC, PQFC?         assume that they share also a lot of utility functions.
PACT, PACT?                                               That was one of your hypotheses, we investigated.
Relay Control (Test Execution)
RLYC, RLYC?
Level Setup Commands
LSUS, LSUS?, DRLV, DRLV? RCLV, RCLV?, TERM, TERM?
Timing Setup Commands
PCLK, PCLK?, DCDF, DCDF? WFDF, WFDF?, WAVE, WAVE?
ETIM, ETIM?, BWDF, BWDF?
Vector Setup Commands
SQLA, SQLB, SQLB?, SQPG, SQPG? SPRM, SPRM?, SQSL, SQSL?
Misc.
FTST, VBMP, PSLV, CLMP WSDM, DCDT, CLKR, VECC
SDSC, SREC, DMAS, STML
To locate these 76 features, we provided one test case for each.
                                                      To factor out all C functions that are executed for all commands, we
                                                      added one test case that contained only the NOP command that does
                                                      nothing at all.
                                                      Because some commands allow variant parameters, we added additional
                                                      test cases to cover these, too.
   real      76   scenarios for relevant commands     In order to factor out code for startup and shutdown etc., we added one
              1   scenario for NOP command            test case in which the firmware was started and immediately shutdown
additional    2   additional parameter combinations   again.
factoring     1   start-end                           Because some of the commands had certain preconditions, we added
                                                      additional test cases to fulfill these preconditions.
             13   scenarios for preparing steps
                                                      In total, we had 93 test cases.
  total      93   scenarios
Here is the resulting concept lattice for our study. The height of the
concepts is proportional to the number of routines contained therein;
except for the bottom element.
In the first layer of the lattice, you find the code of the executors and all
functions that were executed only by these executors.
And below that layer, you can see which utility functions are shared by
which executors.
Furthermore, we could confirm our hypothesis that executors of the same
category have more utility functions in common.
To validate our findings, we asked one developer of this firmware whether
this lattice makes any sense to him.
It did and he learned new things he did not knew before.
In another case study, published in ASE, we evaluated our technique for
                                                          two C compilers, namely, SDCC and cc1, the C compiler of GCC.
                                                          The motivation of this study was to evaluate whether a finer grained
                                                          dynamic analysis is feasible and pays off and whether the technique will
                                                          scale to very large feature sets.
                                                          The features of interests were different loop constructs and mathematical
                  Study of SDCC / GCC(cc1)                expressions in C. In addition to that, we looked at different compiler
                                                          optimization options.

Features of interest:
     Loops: do-while, while, for, if-goto
     Mathematical expressions: +, -, *, /, int literals
     Optimization options
In the earlier study, we traced routines. In this compilers study we
                                     wanted to try statement level.

Granularity Routine vs. Statements
For instance, constructs such as this are expected in compilers written in
                                              a procedural language.
                                              The routine handle() would be called for all loop constructs, but not all
                                              of its code is executed for each loop construct.
         Granularity Routine vs. Statements   If we trace at the level of basic blocks, we would be able to find the code
                                              within handle() that is specific to handle the DO loop in C, for instance.

void handle(   ...   )
{
    switch (   ...   )
    {
        case   DO : ...
        case   WHILE : ...
        case   FOR : ...
        ...
    }
}
In terms of concept lattice, tracing of basic blocks is a refinement of
                   tracing at the routine level.




T1, T2
F1, F2, F3
R = {B0, B1, B2}
That is, a concept in the lattices for routines may be split into several
                                                      concepts in the lattice for basic blocks.
                                                      Thus, you gain more detail.
                                                      The additional level of detail comes not for free, however. The dynamic
                                                      analysis becomes more expensive. Even worse, the lattice becomes bigger.
                                                      Lattices may grow exponentially with the number of attributes and
                                                      objects in the worst case. So we were wondering whether tracing at basic
                                                      block level is feasible.
    T1                    T2
F1, F2   2         3      F1, F3   T1, T2
B0, B1                    B0, B2   F1, F2, F3
                 T1, T2            R = {B0, B1, B2}
             1
                 F1
                 B0
Here are some size numbers concerning the input for concept analysis
                                            when tracing on the level of either routines or basic blocks is used.
                                            The analysis at basic block level was slowed down by a factor in between
                                            50 and 200. But we could in fact compute the lattice in all cases.
                                            Furthermore, by the analysis at basic block level, we could find details
                                            that could not have been found at the routine level.
                           sdcc       cc1
#routines                 1,325    15,986
#routines executed          650     2,657
#basic blocks            46,699   379,086
#basic blocks executed   10,113    34,602
While we had 76 different features in the earlier study, we wanted to see
                                     whether the technique still scales to even larger feature sets.
Scalability for Large Feature Sets   If you can combine features freely, there is easily a combinatorial
                                     explosion of possible features.
Therefore, we looked at 100 different C language constructs in
                                                   combinations with different compiler backends and additional command
            Scalability for Large Feature Sets     line options.


features
    100 test cases for 100 C language constructs
    one/multiple backends
    no/two compiler switches
If we use only one backend of SDCC and no compiler switches, the lattice
               Scalability for Large Feature Sets                         has about 80,000 concepts.



features
       100 test cases for 100 C language constructs
       one/multiple backends
       no/two compiler switches

sdcc
 1     one backend, no compiler switches:             → 80,000 concepts
If we have multiple backends of SDCC and two compiler switches, the
               Scalability for Large Feature Sets                          lattice has about 4.5 mio concepts.



features
       100 test cases for 100 C language constructs
       one/multiple backends
       no/two compiler switches

sdcc
 1     one backend, no compiler switches:             → 80,000 concepts
 2     multiple backends, two compiler switches:      → 4.5 mio concepts
If we use the simple configuration for cc1, the input to concept analysis is
                                                                              a table where 1.3 mio entries are set. For this size, we were not able to
               Scalability for Large Feature Sets                             compute the lattice.
                                                                              So, the approach does not scale for large sets of feature combinations.
                                                                              The lattice should be computed only on demand for subsets of features.
features
       100 test cases for 100 C language constructs
       one/multiple backends
       no/two compiler switches

sdcc
  1    one backend, no compiler switches:               → 80,000 concepts
  2    multiple backends, two compiler switches:       → 4.5 mio concepts


cc1 one backend, no compiler switches:

                                     → invocation table has 1.3 mio entries
                                            → lattice cannot be computed
Now, let’s turn to the question what have others done.
                                                               If you are interested in this question, I recommend to read this upcoming
                                                               paper published by our hosts.
                                                               They have written a very nice survey on papers on feature location that
                                                               will soon appear in the journal of Software Maintenance and Evolution.
                                                               I know they are constantly renaming this journal, but I stick to this name.
             Feature Location in Source Code:
                 A Taxonomy and Survey
Bogdan Dit, Meghan Revelle, Malcom Gethers, Denys Poshyvanyk
              The College of William and Mary
   Journal of Software Maintenance and Evolution to appear
Denys and his colleagues have reviewed 89 articles from 25 venues and
classified them within a taxonomy.
Here is a distribution of the venues of papers published in these venues.
ICSM is second. The premier conference for feature location seems to be
ICPC. However, the chance to get an award for a feature location paper
are higher at ICSM.
There have been several improvements on the dynamic analysis. Some
                     researchers, for instance, take the frequency of execution into account.
                     The intuition is, the more often code is executed, the more relevant it
                     should be.
                     Also they improved the recording of traces. You start and end the
                     recording while the program is executed, so that you observe the
                     execution only right after you triggered the feature of interest.
Dynamic Analyses     In addition to that, textual approaches based on methods from
                     information retrieval emerged. Andrian Marcus is one pioneer in this field
Static Analyses      and Denys Poshyvanyk has continued this work.
Textual approaches
The authors have summarized several open issues in feature location.
                                                    I am listing here those that I find most important.
                                                    We have many competing feature location techniques, but we have no
                                                    clear picture yet, when to use which.
                                                    There was one experiment by Vaclav Rajlich and Norman Wilde, in which
                                                    they compared their static and dynamic approaches.
Open Issues                                         But there is no comprehensive evaluation. Nor are there accepted
                                                    benchmarks. Luckily, Denys and colleagues have started to create some.
    accepted evaluation procedures and benchmarks   There is no Eclipse plugin for feature location, other than maybe
                                                    prototypes. The techniques we developed are not really used in the field.
    tool adoption in industry
                                                    We have not yet found the right ways of smooth integrations of such
    user studies                                    tools in the developer’s toolkit.
                                                    In order to do so, we must better understand how programmers do
                                                    feature location. There are some initial observational studies. We need
                                                    more of these and we also need tool evaluations with real programmers.
Finally, let me conclude with a quote from a senior researcher stated in a
                                                                       panel at CSMR 2009 in Kaiserslautern.
                                                                       He said that feature location is irrelevant in industry.
                                                                       I have never had the change to ask him what he meant by this
                                                                       statement. I personally do sometimes need to locate features in my code.
                                                                       Regrettably, I am still using mostly grep.

“Feature location is irrelevant in industry.”

                        Senior Researcher, CSMR 2009, Kaiserslautern

ICSM'01 Most Influential Paper - Rainer Koschke

  • 1.
    Aiding Program Comprehensionby Static and Dynamic Feature Analysis Thomas Eisenbarth1 , Rainer Koschke2 , Daniel Simon3 1 Axivion GmbH 2 Universit¨t a Bremen 3 SQS ICSM 2011 Presentation of Most-Influential Paper ICSM 2001
  • 2.
    This paper wasjoint work with my two colleagues. These are the three authors at the time of the publication, ten years ago. Left you have Thomas Eisenbarth and at the right you see Daniel Simon. Unfortunately, they cannot be here. They want me to send their best regards. They are – like me – very honored by this award.
  • 3.
    Here are twomore current photographs of them. They have not changed much. That is no surprise since their main expertise is in maintenance.
  • 4.
    I remember ICSM2001 very well. It was in a great location. In Florence. Florence has so many attractions.
  • 5.
    Florence is fullof so many attractions and beauty. It was a real surprise that someone showed up at my talk at Florence.
  • 6.
    Before I tellyou more about the content of the paper, I would like to tell you a bit about the history of the paper itself, that is, its development process.
  • 7.
    The initial triggerfor the idea of our paper was the call for paper of a German software product line workshop. *&4& 'SBVOIPGFS *OTUJUVU Call for Papers &YQFSJNFOUFMMFT 4PGUXBSF &OHJOFFSJOH 1. Deutscher Software-Produktlinien Workshop Kaiserslautern, 10. November 2000 Hintergrund Themengebiete Beiträge, vor allem, aber nicht ausschließlich zu den Die Entwicklung ähnlicher Produkte als Produktlinie folgenden Themen, sind willkommen: – oder Produktfamilie – bietet gegenüber der relativ • Planung von Produktlinien teuren Einzelsystementwicklung viele Vorteile, die • Requirements Engineering für Produktlinien überwiegend darauf beruhen, daß alle Familienmit- • Modellierung von Produktlinien glieder auf einer gemeinsamen Infrastruktur – auch • Verfolgbarkeit von Anforderungen Plattform oder Architektur genannt – aufbauen. Wäh- • Konfigurationsmanagement für Produktlinien rend in anderen Industriebranchen, wie z.B. dem • Definition von Softwarearchitekturen Automobilbau oder der Unterhaltungsindustrie, die • Recovery von Softwarearchitekturen Vorteile der Produktlinienentwicklung längst systema- • Referenzarchitekturen für Produktlinien tisch genutzt werden, werden die meisten Softwaresy- • Weiterentwicklung von Architekturen steme nach wie vor als teure Einzelstücke gefertigt. • Komponententechnologie für Produktlinien Dabei kann speziell die Softwareentwicklung von • Reengineering im Hinblick auf Produktlinien Produktlinien profitieren: zum Beispiel durch Zeit- • Industrielle Erfahrungen mit Produktlinien und Kostenersparnis bei der Entwicklung neuer, ähnli- • Produktlinien für KMUs cher Produkte oder durch höhere Produktqualität auf- • Einführung von Produktlinienansätzen grund eines hohen Wiederverwendundgsanteils Beiträge sind in elektronischer Form (PDF oder existierender und bereits erprobter Komponenten. PostScript) an knauber@iese.fhg.de einzureichen; der Auch das Anpassen von Standardprodukten an beson- Umfang der Beiträge sollte fünf Seiten nicht über- dere Kundenwünsche wird durch vorab geplante schreiten. Weitere Informationen sind unter Variabilität erleichtert. Produktlinien decken naturge- http://www.iese.fhg.de/dspl-workshop mäß den gesamten Softwarelebenszyklus ab, daher verfügbar. integrieren sie viele andere Themenbereiche wie Termine: Requirements Engineering, Softwarearchitekturen Einsendung von Beiträgen: 31.8.2000 und Reengineering. Benachrichtigung über die Annahme: 1.10.2000 Nach etwa einem Jahrzehnt der Forschung erfahren Einsendung der endgültigen Version: 20.10.2000 Produktlinien für Softwaresysteme immer mehr Auf- Versand des endgültigen Programms: 25.10.2000 merksamkeit, was sich in der zunehmenden Anzahl Programmkommitee: internationaler Veranstaltungen zu diesem Themen- • Dr. P. Knauber (Fraunhofer IESE) kreis niederschlägt. Auch in Deutschland stoßen Pro- • Prof. Dr. K. Pohl (Universität Essen) duktlinien und benachbarte Themengebiete auf immer mehr Interesse, was sich unter anderem an der Beteili- • Prof. Dr. C. Atkinson (Universität Kaiserslautern) gung verschiedener Organisationen an europäischen • Dr. G. Böckle (Siemens AG) Projekten wie z.B. PRAISE und ESAPS zeigt. • Dr.-Ing. K. Czarnecki (DaimlerChrysler AG) • Prof. Dr. U. Eisenecker (FH Kaiserslautern) Ziel des Workshops • Prof. Dr. E. Plödereder (Universität Stuttgart) • Prof. Dr. W. Pree (Universität Konstanz) Der Workshop hat zum Ziel, einen Erfahrungsaus- • Prof. Dr. D. Rombach (Fraunhofer IESE) tausch zwischen Industrie und Forschung im Bereich • S. Thiel (Robert Bosch GmbH) der Software-Produktlinien und angrenzender The- • R. Trauter (DaimlerChrysler AG) menbereiche zu ermöglichen. • Dr. M. Verlage (Market Maker Software AG)
  • 8.
    In software productlines, they have these product-feature maps that describe the commonalities and differences of the products with respect to their features as a table.
  • 9.
    At that time,there was a German professor, Gregor Snelting, who introduced formal concept analysis in software engineering. I taught formal concept analysis as part of my reengineering class.
  • 10.
    Concept analysis allowsyou to analyze such tables. In mathematical terms, concept analysis is a technique to analyze the structure of arbitrary binary relations. We proposed in that German workshop to use concept analysis to analyze such product-feature maps in software product lines. I will describe it later in more detail.
  • 11.
    However, we weremore interested in program analysis than in requirement engineering. Another problem they have in product lines is to identify the components necessary to implement a feature, which is needed to identify re-usable components to be used in product lines. So we decided to use formal concept analysis to search where features are implemented in the code.
  • 12.
    We submitted apaper describing this idea to CSMR. Derivation of Feature Component Maps by means of Concept Analysis and components and, hence, into feasibility and costs component map and Section 4 describes our experience the intent of c, denoted by intent(c). C1 1. cohesive modules and subsystems as defined and doc- nario or an invoked feature, respectively. If composite element. because the relationship was derived only from a specific component c for which γ(c) = C holds). 21 of the concepts names of the features correspond to the objects drawn via resulting lattice contained 55 concepts, most of them intro- to cause interferences by invoking irrelevant features. For our terminology – a set of usage scenarios) is identi- are required to implement a particular feature and is to get an execution trace for each feature. A more sophisti- 544-554, May 1999 of different alternative product family platforms. The with this technique in a case study. Section 5 discusses Informally, a concept corresponds to a maximal rectan- C2 umented by the system’s architects or re-gained by re- components are used for concept analysis, the execution • A feature, f, is specific to exactly one component, c, if implementation. pw_arcbox draw_arcbox arcbox_drawing_selected erase_box_lengths init_box_drawing box_drawing_selected 5 create_lineobject line_drawing_selected resizing_poly elastic_poly regpoly_drawing_selected set_latestarc redisplay_arc last_arc add_arc XRotDrawString draw_shift_mousefun_canvas clear_mousefun_kbd draw_mousefun_kbd 41 resizing_ebr elastic_ebr ellipsebyradius_drawing_selected 44 resizing_cbd elastic_cbd circlebydiameter_drawing_selected 42 resizing_ebd elastic_ebd ellipsebydiameter_drawing_selected 43 esizing_cbr elastic_cbr circlebyradius_drawing_selected do not introduce any new component and merely merge the panel in Figure 5; e.g., draw-ellipse-radius means that duce no new component. We observed that the related instance, Xfig uses a balloon help facility that pops up a fied that will invoke a feature. needed at an early stage within a process toward a product cated environment would allow to start and end recording [3] Brandenburg, F.J., ‘Graphlet’, Universität Passau, set_latestspline knowledge gained from the feature component map related research. gle of filled table cells modulo row and column permuta- engineers; modules and subsystems will be consid- trace containing the required low-level components c is the only component on all paths from µ(f) to the The information described above can be derived by a functionality needed by several superconcepts. an ellipse was drawn where the radius was specified (as shapes, i.e., the variants of splines, circles, ellipses, etc., little window when the cursor stays some time on a sensi- family platform traces at any time. http://www.infosun.fmi.uni-passau.de/Graphlet/. list_add_arc check_cancel draw_spline Thomas Eisenbarth, Rainer Koschke, Daniel Simon 2 2. The excluding input set E is identified that will not compute_direction textsize boxsize_msg draw−rectangle.mon redisplay_spline compute_arccenter pw_text resizing_box draw−polyline.mon last_spline C3 C4 C5 elastic_box draw−polygone.mon 3 add_spline draw_arc create_arc lookfont set_latesttext 40 and additional economic considerations may lead to a tions. For example, Table 2 contains the concepts for the ered composite components in the following; induces an execution trace for composite components by tool and fed back to the product family expert. As soon as The first interesting observation is that concepts with opposed to the diameter). were merged at the top of the lattice since they use almost tive area of the GUI (e.g., over the button selecting the cir- Our implementation only counts subprogram calls and list_add_spline [4] Canfora, G., Cimitile, A., De Lucia, A., and Di Lucca, G.A., arc_bound in_text_bound bottom element (i.e, c is the only component required invoke a feature. • to weigh alternative platform architectures, create_spline 1 39 arc_drawing_selected text_search create_sfactor redisplay_text spline_bound toggle_textmarker spline_drawing_selected further selection of only a certain subset of all features 2. Concept Analysis relation in Table 1. replacing each low-level component with the composite a decision is made re-use certain features, all components pw_curve University of Stuttgart, Breitwiesenstr. 20-22, 70565 Stuttgart, Germany many components can be found in the upper region, while the same components. In order to reduce the size of the cle drawing mode). Sometimes the balloon help ignores accesses to global variables and single statements ‘A Case Study of Applying an Eclectic Approach to Identify last_text C6 make_sfactors compute_angle set_latestellipse 2. physical modules, i.e., modules as defined by means to implement feature f). add_text set_latestline redisplay_ellipse 3. The program is executed twice using I and E sepa- • to aim further tasks – like quality assessment – to only list_add_text redisplay_line center_marker x_fontnum < 38 Objects in Code’, Workshop on Program Comprehension, pp. last_line last_ellipse draw_text and their corresponding components. component to which it belongs. Hence, each system run required for these features (easily derived from the con- free_points C7 of the underlying programming language or simply add_line 6 list_add_line elastic_moveline 4 new_string add_ellipse list_add_ellipse in the lower region, the number of components decreases lattice, we selected one representative among the related mechanism triggers, introducing interferences between or expressions. It might be useful to analyze at a finer {eisenbts, koschke, simondl}@informatik.uni-stuttgart.de cancel_line_drawing create_text ({o1, o2, o3, o4}, ∅) • Features, to which two components, c1 and c2, jointly rately. those existing components that are needed to populate draw_line create_ellipse text_bound 136-143, Pittsburgh, 1999, IEEE Computer Society Press. get_intermediatepoint ellipse_bound Concept analysis is a mathematical technique that pro- C1 create_line erase_char_string init_trace_drawing yields all required components for a single scenario that cept lattice) form a starting point for further analyses to 7 and the number of interferences increases (an interference shapes and re-run the experiment with three shapes features. Such effects affect the analysis because they granularity when subprograms are interleaved, i.e., differ- line_bound draw_ellipse char_handler 4. The selected components are more closely analyzed, directly available as existing files (the distinction to draw_char_string Figure 2. Concept lattice for Table 1. contribute, can be identified by γ(c1) ∧ γ(c2); graphi- the platform architecture, finish_text_input unconstrained_line 4. By comparison of the two resulting execution traces, text_drawing_selected vides insights into binary relations. The mathematical C2 ({o2, o3, o4}, {a3, a4}) exploits one feature. Thus, a single column in the relation investigate quality (like maintainability, extractability, and elastic_line 32 mouse_balloon print_to_file leads to an unstructured lattice; a lattice is said to be struc- (ellipse, polygon, and open approximated spline). The introduce spurious connections between features. Fortu- ent strands of control with different functionality are [5] Chen, K. und Rajlich, V., ‘Case Study of Feature Location Abstract tecture. for instance, with respect to maintainability, extract- tion can be visualized in a more readable equivalent way cohesive modules is that one does not know a priori XRotDrawAlignedImageString Using Dependence Graph’, Proc. of the 8th Int. Workshop on the components can be identified that implement the • and to decide on further steps, like reengineering or erase_lengths foundation of concept analysis was laid by Birkhoff in table can be obtained per system run. Applying all usage cally depicted, one ascertains in the lattice the closest integrability) and to estimate effort for subsequent steps see Figure 6 tured if it can be decomposed into independent sublattices resulting lattice is shown in Figure 7. nately, this problem can be partly fixed by providing a spe- united in a single subprogram, possibly for efficiency rea- ability, and integrability. by marking only the graph node with an attribute a ∈ A whether physical modules really group cohesive dec- append_point One important piece of information for a product fam- Program Comprehension, pp. 241-249, June 10-11, 2000, create_point clip_arrows 1940. It has already been successfully used in other fields C3 ({o1}, {a1, a2}) common node toward the top element starting at the feature. wrapping. Feature component maps describe which components whose represented concept is the most general concept larations; physical modules are the unscrutinized scenarios provides the relation table. (wrapping, reengineering, or re-development from node altlength_msg concept that are connected via the top and bottom elements only). cific scenario in which only the accidentally invoked sons. For instance, we have found a subprogram in our Limerick, Ireland, IEEE Computer Society Press. ily analysis that tries to integrate existing assets is the so- 5. A product family platform is designed. Alternatives of software engineering. The binary relation in our specific nodes to which c1 and c2, respectively, are attached; 8: {arrow_bound} mode_balloon node length_msg draw_mousefun_topruler are needed to implement a particular feature and are used C4 ({o2, o4}, {a3, a4, a5}) result of a programmer’s way of grouping declara- An execution trace can be recorded by a profiler. How- scratch). 4: {create_mouse That is to say that there are many specific operations and irrelevant feature is invoked, which leads to a refactored Wilde and Scully focus on localizing rather than deriv- The technique presented in this paper yields the feature case study that draws different kinds of objects. The func- called feature component map that describes which com- for components to populate the product family plat- application of concept analysis to derive the feature com- that has a in its intent. Analogously, a node will be marked cmd_balloon} The taller a concept is, the more [6] Graudejus, H., Implementing a Concept Analysis Tool for early in processes to develop a product family based on ever, most profilers only record subprogram calls but not all features at and above this common node are those few shared operations and also that shared operations are concept lattice that contains a new concept that isolates the ing required components: For deriving all required compo- component map automatically using the execution traces tion contained a large switch statement whose branches with an object o ∈ O if it represents the most special con- tions whether it makes sense or not); {setup_ind_panel ponents are needed to implement a particular feature. A form are weighed: component extraction and reengi- C5 ({o3, o4}, {a3, a4, a6, a7, a8}) set_line_stuff components it contains. Identifying Abstract Data Types in C Code, master thesis, 3.3. Implementation set_cursor existing components. This paper describes a new tech- ponent map states which components are required when a accesses to variables. Instead of using a symbolic debug- jointly implemented by these components. create_bitmaps Figure 4. Lattice for the first experiment really used for many features. irrelevant feature and its components. In our example, nents, the execution trace for the including input set is for different usage scenarios. The technique is based on drew the specific kinds of objects. In the execution trace, cept that has o in its extent. The unique element µ in the University of Kaiserslautern, Germany, 1998. process_pending redisplay_zoomed_region feature is a realized (functional as well as non-functional) neering, new development, integration of COTS, or feature is invoked. This section describes concept analysis C6 ({o4}, {a3, a4, a5, a6, a7, a8}) 3. subprograms, i.e., functions and procedures, and glo- ... main 2 3 nique to derive the feature component map and additional concept lattice marked with a is therefore: ger, for example, that allows to set watchpoints on variable • Components jointly required for two features, f1 and Concept #1 in Figure 4 is the largest concept (exclud- 1 interferences due to an accidentally invoked irrelevant fea- sufficient. By subtracting all components in the execution concept analysis, a mathematical sound technique to ana- this subprogram showed up for all objects where in fact [7] Lindig, C. and Snelting, G., ‘Assessing Modular Structure of requirement (the term feature is intentionally weakly wrapping. in more detail. bal variables of the system; subprograms and global The implementation of the described approach is sur- firstly present a general overview of the results and sec- scenarios. To identify all subprograms required for a sin- (∅, {a1, a2, a3, a4, a5, a6, a7, a8}) accesses, or even to instrument the code if no sophisticated ing the bottom element). It exploits a single feature “draw ture appeared only at the two layers directly on top of the trace for the excluding input set from those in the execu- lyze binary relations, which has the additional benefits to only specific parts of it were actually executed. ∨ { c ∈ L(C ) a ∈ intent ( c ) } dependencies utilizing dynamic information and concept C7 f2, are described by µ(f1) ∨ µ(f2); graphically Legacy Code Based on Mathematical Concept Analysis’, defined because its exact meaning depends on the specific 6. A migration plan is prepared. Concept analysis is based on a relation R between a set variables will be called low-level components in the prisingly simple (if one already has a tool for concept analysis. The method is simple to apply, cost-effective, µ(a) = (1) profiler is available, one can also use a simple static ondly go into further details for particular interesting gle feature or a set of features, one can then analyze the text object”. According to the lattice, the feature is largely bottom element of the lattice, and could be more or less tion trace for the invoking input set, only those compo- reveal not only correspondences between features and Furthermore, the success of the described approach Proc. of the Int. Conference on Software Engineering, pp. context). Components are computational units of a soft- Table 2: Concepts for Table 1. following. depicted, one ascertains in the lattice the closest com- largely language independent, and can yield results The technique described in this article is used to derive of objects O and a set of attributes A, hence R ⊆ O × A. The unique element γ marked with object o is: dependency analysis: One considers all variables directly analysis). Our prototype for a Unix environment is an observations. concept lattice as described in Section 3.2. independent from other features and shares only a few ignored. nents remain that specifically deal with the feature. components but also dependencies between features and heavily depends on the clever choice of usage scenarios 349-359, Boston, 1997. ware architecture (see Section 3.1). Because the feature The tuple C = (O, A, R) is called formal context. For a The set of all concepts of a given formal context forms Ideally, one will use alternative (1) when reliable and mon node toward the bottom element starting at the opportunistic integration of the following parts: ∧ { c ∈ L(C ) o ∈ extent ( c ) } quickly and very early in the process. the feature component map which plays a central role and statically accessed for each executed subprogram also Xfig is a menu-driven tool that allows the user to draw First experiment. In our first experiment, we prepared 15 components with other features. Note that our technique achieves the same effect by between components (feature-feature dependencies are and the combination of them. Scenarios that cover too [8] Lindig, C., Concepts, component map is needed very early to trade off alterna- set of objects, O ⊆ O, the set of common attributes, σ, is a partial order via: γ (o) = (2) complete documentation exists. However, if cohesive nodes to which f1 and f2, respectively, are attached; all • Gnu C compiler gcc to compile the system using a ftp://ftp.ips.cs.tu-bs.de/pub/local/softech/misc. early in this process. to be dynamically accessed (all transitively accessed vari- and manipulate objects interactively under the X Window scenarios. Each scenario invokes Xfig, performs the draw- Concept #5 represents the two features “draw polyline” 4 5. Related Research considering several execution traces for different features derived from an existing system and, hence, may only much functionality in one step or the clumsy combination tives in good time, complete and hence time-consuming defined as: modules and subsystems are not known in advance, one components at and below this common node are those command line switch for generating profiling infor- 1. Introduction reverse engineering of the system is out of the question. In Overview. The technique described here is based on the ( O 1, A 1 ) ≤ ( O 2, A 2 ) ⇔ O 1 ⊆ O 2 or equivalently with We will call a graph representing a concept lattice ables will automatically be considered because all exe- System. Objects can be lines, polygons, circles, rectangles, ing of one of the objects Xfig provides, and then termi- and “draw polygon”. The only difference between these at a time. Components not specific to a feature will “sink” exist for this particular system but not necessarily for these of scenarios will result in huge and complex lattices that [9] Koschke, R., ‘Atomic Architectural Component Recovery for would hardly make the effort to analyze a large system to jointly required for these features. mation, The mathematical foundation of concept analysis was Program Understanding and Evolution’, Dissertation, Institut σ ( O ) = { a ∈ A ∀( o ∈ O ) ( o, a ) ∈ R } using this marking strategy a sparse representation. The cuted subprograms are examined). In practice, this splines, text, and imported pictures. An interesting first nates Xfig, i.e., the aspects above were not combined and two features is that an additional line is drawn that closes a in the concept lattice, i.e., will be closer to the bottom ele- features in general). are unreadable for humans. Moreover, the number of particular, the decision for a certain alternative will lead to execution traces generated by a profiler for different usage ( O 1, A 1 ) ≤ ( O 2, A 2 ) ⇔ A 1 ⊇ A 2 . obtain these in order to apply concept analysis to get the laid by Birkhoff in 1940. Primarily Snelting has recently für Informatik, Universität Stuttgart, 2000, Developing similar products as product families prom- equivalent sparse representation for Figure 2 is shown in analysis may be a sufficient approximation. But one • Components required for all features can be found at • Gnu object code viewer nm and a short Perl script in task in our case study was to define what constitutes a fea- polygon. This difference is not visible in the concept lat- ment. More precisely, recall from Section 3.2 that a com- The technique is primarily suited for functional fea- usage scenarios increases tremendously when features are a consolidation on specific economically important core scenarios (see Figure 1). One scenario represents the invo- Analogously, the set of common objects, τ, for a set of no other functionality of Xfig was used. We used all http://www.informatik.uni-stuttgart.de/ifi/ps/rainer/thesis. ises several advantages over relatively expensive separate If c1 ≤ c2 holds, then c1 is called a subconcept of c2 Figure 3. The content of a node N in this representation feature component map because it not yet clear which should be aware that it may overestimate references the bottom element. order to identify all functions of the system (as ture. Clearly, the capability to draw specific objects, like shapes of Xfig’s drawing panel shown in Figure 5 except tice since the two features are attached to the same con- introduced concept analysis to software engineering. Since ponent, c, is specific to exactly one feature, f, if f is the tures that may be mapped to components. In particular combined. components in many cases and hence to an exclusion of cation of one single feature and yields all subprograms attributes, A⊆ A, is defined as: 5 [10]Krone, M. and Snelting, G., ‘On the Inference of Configura- developments, like lesser costs and shorter time for devel- executed for this feature. These subprograms identify the and c2 is called superconcept of c1. For instance, can be derived as follows: components are relevant at all and reverse engineering of because variable accesses may be included that are on • Features that require all components can be found at opposed to those included from standard libraries), lines, splines, rectangles, etc., can be considered a feature picture objects and library objects. cept. The distinction is made in the body of the function then it has been used to evaluate class hierarchies [15], only feature on all paths from γ(c) to the top element. non-functional features do not easily map to components. In our case study, the method provided us with valuable opment, test, and maintenance. These advantages are less important components. Any investment in a deep and τ ( A ) = { o ∈ O ∀( a ∈ A ) ( o, a ) ∈ R } the complete system first will likely not be cost-effective. of Xfig. Moreover, one can manipulate drawn objects in explore configuration structures of preprocessor state- tion Structures From Source Code’, Proc. of the Int. Confer- costly pre-analysis of less important components would be components (or are themselves considered components) ({o2, o4}, {a3, a4, a5}) ≤ ({o2, o3, o4}, {a3, a4}) is true in • the objects of N are all objects at and below N, paths not executed at runtime, and it will also ignore refer- the top element. • Gnu profiler gprof and a short Perl script to ascertain that is called to draw either a polygon or a polyline. Con- One may argue that components that are only required For example, for applications for which timing is critical insights. The lattice revealed dependencies among features ence on Software Engineering, pp. 49-57, May 1994, IEEE based on the fact that all family members share a common In Section 3.1, the formal context for applying concept Only later, if the retrieved feature component map (using ences to variables by means of aliases if the simple static different edit modes (rotate, move, copy, scale, etc.) with cept #3 denotes the feature “draw spline”. Concept #4 has ments [10, 14], and to recover components [4,7,12,13]. to get the system started, but are not – strictly speaking – (because it may result in diverging behavior), the features for the Xfig implementation and the absence of such Computer Society Press. in vain to a large degree. Instead, reverse engineering in required for a certain feature. The required components for Table 2. • the attributes of N are all attributes at and above N. • If the top element does not contain features, then all the executed functions in the execution trace, infrastructure – also known as platform architecture. There analysis to derive the feature component map will be laid simpler definitions of components, like those in (2) or (3)) dependency analysis does not take aliasing into account. Xfig. Hence, we considered as main features the following circle by radius circle by diameter no feature attached and represents the components shared For feature localization, Chen and Rajlich [5] propose a directly necessary for any feature will still appear in the would also have to take time into account. dependencies, respectively; e.g., the abilities to draw text early phases should give information on the feature com- all scenarios and the set of features are then subject to con- The set of all concepts of a given formal context and components in the top element are superfluous (such • concept analysis tool concepts [8], [11]Perry, D., ‘Generic Architecture Descriptions for Product are many approaches to newly developing product families down as follows; For instance, the node in Figure 3 marked with o2 and clearly shows which lower-level components should be For a first analysis to obtain a simplified feature compo- two capabilities: for drawing polygons, polylines, and splines. These com- 6 semi-automatic method, in which an analyst browses the concept lattice when we do not subtract execution traces Note also that the technique is not suited for features and circles/ellipses are widely independent from other ponent map quickly and with simple means. To this end, cept analysis. Concept analysis gives information on rela- the partial order ≤ form a complete lattice, called concept investigated further to obtain composite components, components will not exist when the set of objects for ellipse by radii ellipse by diameters Figure 6. Relevant parts for Lines’, Proc. of the Second International ESPRIT ARES from scratch [2, 11]. However, according to Martinez [16], • components will be considered objects, a5 is the concept ({o2, o4}, {a3, a4, a5}). nent map, one can also ignore variables and come back to • graph editor Graphlet [3] to visualize the concept lat- 1. ability to draw different shapes (lines, curves, rectan- ponents are no real drawing operations but operations to circles and ellipses statically derived dependency graph; navigation on that for an excluding input set. It is true that these components that are only internally visible, like whether a compiler shapes. Related features were grouped together in the con- Workshop, Lecture Notes in Computer Science 1429, pp. 51- the product line analyst imparts all relevant features, for tionships between features and required components as lattice L: reverse engineering may generally pay off (in order to concept analysis contains only components executed tice, closed approx. spline approximated spline Figure 7. Concept lattice for second experiment. graph is computer-aided. Since the analyst more or less 56, Springer, 1998 most successful examples of product families at Motorola a3, a4 these in a later phase using more sophisticated dynamic or gles, etc.) keep a log of the points set by the user and to draw lines Nodes #41, #42, #43, and #44 represent the features to cannot be distinguished from components that in fact con- uses a certain intermediate representation. Strictly speak- cept lattice, which allowed us to compare our mental which the necessary components need to be detected, to well as feature-feature and component-component depen- • features will be considered attributes, O A detect cohesive modules, we have developed a semi-auto- at least once, which is the case if a filter ignores all This lattice consists of 22 concepts, three of them pro- takes on all the search, this method is less suited to quickly originated in a single separate product. Only in the course dencies. L(C) = { ( O, A ) ∈ 2 × 2 A = σ(O) ∧ O = τ( A)} a1, a2 static analyses. subprograms for which the profiler reports an execu- • and two more short Perl scripts to convert the file for- closed interpol. spline interpolated spline between set points while the user is still setting points (a draw circles and ellipses using either diameter or radius. tribute to all components because both kinds of compo- ing, internal features may be viewed as implementation model of a drawing tool to the actual implementation of [12]Sahraoui, H., Melo. W, Lounis, H., and Dumont, F. (1997), the reverse engineer who in turn delivers the feature com- • a pair (component c, feature f) is in relation R if c is a5 a6, a7, a8 matic method integrating many automatic state-of-the-art 2. ability to modify shapes in different editing modes vide the specific functionality for the respective shapes. and cheaply derive the feature component map. Moreover, ‘Applying Concept Formation Methods to Object Identifica- of time, a shared architecture for a product family evolved. mats of concepts and Graphlet (all Perl scripts polygon polyline spline first appears as polygon and is only re-shaped when They all contain three specific components to draw the nents jointly appear at the bottom element. However, the details. However, such implementation details may be of Xfig. The lattice also classified components according to ponent map. On the basis of the feature component map executed when f is invoked. The infimum of two concepts in this lattice is com- o1 o2 o3 techniques [9]). tion count of 0). (rotate, move, copy, scale, etc.) Concept #1 (21 functions) depicts the functionality for tion in Procedural Code’, Proc. of the Conference on Auto- Moreover, large investments impose a reluctance against feature F 3.2. Interpretation of the Concept Lattice together have just 147 LOC). rectangular box the user has set all points). object, to plot an elastic bend while the user is drawing, the method relies on the quality of the static dependency idea of an excluding input set can be taken over to our interest for defining a product family architecture. Internal their abstraction level, which is a useful information for and additional economic reasons, a decision is made for puted by intersecting their extents as follows: o4 rectangular box splines and concept #2 (17 functions) represents the one mated Software Engineering, Nevada, pp. 210-218, introducing a product family approach that ignores exist- usage scenario However, here – for the time being – we will use as an < Alternative (2) can be chosen if suitable documentation • If the bottom element does not contain any compo- The fact that the subprograms are extracted from the We conducted two experiments. In the first one, we with rounded corners Concept #2 stands for the feature “draw arc” and con- and to resize the object. Note the similarity of the compo- graph. If this graph, for example, does not contain infor- technique to distinguish these two kinds of components by features can only be detected by looking at the source, re-use; general components can be found at the lower November, IEEE Computer Society. ing assets. Hence, an introduction of a product family particularly interesting and required components, and fur- abstract example the binary relation between arbitrary ( O 1, A 1 ) ∧ ( O 2, A 2 ) = ( O 1 ∩ O 2, σ ( O 1 ∩ O 2 ) ) is not available but there is reason to trust the program- Concept analysis applied to the formal context nent, all features in the bottom element are not imple- investigated the ability to draw different shapes only. In regular polygon arc for lines (used for polygons). Both are dependent on con- mation on potential values of function pointers, the human execution trace Figure 3. Sparse representation of Figure 2. object code makes the implementation independent from cept #7 is again a concept that represents shared compo- nent names. The specific commonalities among circles and providing a usage scenario in which no feature is invoked, because it is not clear how to invoke them from outside level, specific components at the upper level. Moreover, ther expensive analyses regarding quality can be cost- mers of the system to a great extent. In all other cases, one described in the last section gives a lattice, from which mented by the system (this constellation will not the second one, we analyzed the ability to modify shapes. picture object text cept #4 (29 functions) that groups functions related to analyst may miss functions only called via function point- [13]Siff, M. and Reps, T., ‘Identifying Modules via Concept approach has generally to cope with existing code. required components C1 …Cn objects and attributes shown in Table 1. An object oi has The infimum describes a set of common attributes of the programming language to a great extent (as long as the nents for drawing elastic lines while the user is setting ellipses are represented by node #38, which introduces the like simply starting and immediately shutting down the and how to derive from an execution trace whether these the lattice showed dependencies among components, Analysis’, Proc. of the Int. Conference on Software Mainte- effectively aimed at selected components. will fall back on alternative (3). However, for alternative interesting relationships can be derived. These relation- exist, if there is a usage scenario for each feature and The second experiment exemplifies combined features library object points. Concept #3 (20 functions) denotes the ellipse fea- ers. At the other extreme, if the too conservative assump- Reverse engineering may help creating a product fam- attribute aj if row i and column j is marked with an ! in two sets of objects. Similarly, the supremum is deter- points. The difference between concept #7 and concept #4 shared components to draw circles and ellipses (both spec- system without invoking any relevant feature. That simple features are present or not. However, we assume that which need to be known when components are to be This paper describes a quickly realizable technique to (F, C1), …(F, Cn) ∈R 3. Feature Component Map (3), concept analysis may additionally yield hints on sets ships can be fully automatically derived and presented to every usage scenario is appropriate and relevant to the language is compiled to object code) and has the advan- composed by basic features. For the second experiment, a ture, concept #5 (29 functions) the general drawing sup- nance, Bari, pp. 170-179, October, 1997, IEEE Computer ily for existing systems by identifying and analyzing the mined by intersecting the intents: is that the former only contains the components to draw ified by diameter and radius). tion is made that every function whose address is taken is trick separates the two kinds of components in two distinct externally visible features are generally more important. extracted. Society. ascertain the feature component map based on dynamic concept analysis Table 1 (the example stems from Lindig and Snelting [7]). the analyst such that the more complicated theoretical system; a system may indeed not have all features, tage that no front end is necessary. On the other hand, Figure 5. Xfig’s object shapes. port functionality and concept #6 (123 functions) the start- components and also by deriving the individual architec- of related subprograms forming composite components. shape is drawn and then modified. Both draw and modify the elastic line, while the latter adds the capability to set an Nodes #32 and #39 connect the circles and ellipses to called at each function pointer call site, the search space concepts, C1 and C2, in the lattice where C1 < C2 and The invocation for externally visible features is com- As future work, we want to explore how results information (gained from execution traces) and concept feature component map For instance, the following equations hold for this table, ( O 1, A 1 ) ∨ ( O 2, A 2 ) = ( τ( A 1 ∩ A 2), A 1 ∩ A 2 ) In order to derive the feature component map via con- background can be hidden. The only thing an analyst has i.e., a usage scenario may be meaningless for a given because a compiler may replace source names by link The resulting lattice for this experiment is shown in up and initialization code of the system. [14]Snelting, G., ‘Reengineering of Configurations Based on ture from each system. These individual architectures may The relation for the formal context necessary for con- constitute a basic feature. Combined features add to the arbitrary number of points. Splines do not need this capa- the other objects. No components are attached to nodes increases extremely. Generally, it is statically undecidable paratively simple when a graphical user interface is avail- obtained by the method described in this paper may be Mathematical Concept Analysis’, ACM Transactions on Soft- analysis. The technique is automatic to a great extent. and dependencies also known as relation table: cept analysis, one has to define the formal context to know is how to interpret the derived relationships. This names in the object code (for instance, C++ compilers use Figure 4. The contents of the concepts in the lattice are Analyzing concepts #1, #2, and #3, we found that the C1= ⊥ and C2 contains only those components that are then be unified to a platform architecture and the derived The supremum ascertains the set of common objects, cept analysis is defined as follows: system). effort needed to derive the feature component map as there bility because they are defined by exactly three points. #32 and #39, they only merge components from different which paths are taken at runtime, so that every static anal- able (as it was the case in our case study). Then, usually combined with results of additional static analyses. For ware Engineering and Methodology 5, 2, pp. 146-189, April, Concept analysis is a mathematical technique to investi- σ ( { o 1 } ) = { a 1, a 2 } and τ ( { a 7, a 8 } ) = { o 3, o 4 } (objects, attributes, relation) and to interpret the resulting name mangling to resolve overloading) there is not always omitted for readability reasons. However, their size in this shapes provide individual rotate functions. In other words, really required for all components in a narrower sense. components may be used to populate the unified architec- Figure 1. Overview. which share all attributes in the intersection of two sets of (C, F) ∈ R if and only if component C is required section explains how interesting relationships can be auto- Beyond these relationships between components and are many possible combinations. ysis will yield an overestimated search space, whereas 1997. concept lattice accordingly. a direct mapping from the subprograms in the execution Concept #6 represents the feature “draw lines” and is concepts. The two nodes have a direct infimum (not shown Furthermore, our technique goes beyond Wilde and only a menu selection or a similar interaction is necessary. example, we want to investigate the relation between the gate binary relations (see Section 2). attributes. matically derived. In both experiments, we considered subprograms as picture is a linear function of their number of components the rotate feature is implemented specific to each shape, dynamic analyses exactly tell which parts are really used [15]Snelting, G. and Tip, F., ‘Reengineering Class Hierarchies ture. To this end, code needs to be adjusted, reengineered, We want to point out that not all non-functional when feature F is invoked; a subprogram is features, further useful aspects between features on one trace back to the original source. Because we dealt in our used for drawing rectangles, polygons, and polylines, as in Figure 6) and add the same components to the circle and In the case of a batch system, one may vary command line concept lattice based on dynamic information and static a1 a2 a3 a4 a5 a6 a7 a8 As already abstractly described in Section 2, the fol- components. However, in our simple implementation, we (except for the bottom element that contains 136 compo- i.e., there is no generic component that draws all different Scully’s technique in that it also allows to derive relevant Using Concept Analysis’, Proc. of the ACM SIGSOFT Sym- or wrapped. However, changing or wrapping the code is Integration into a Product Family Process. A simple requirements, e.g., time constraints, can be easily mapped Graphically, the concept lattice for the example relation 3.1. Context for Feature and Components required when it needs to be executed; a global hand and between components on the other hand may be one would expect. The generality of this feature becomes ellipse features. The components inherited via these two at runtime (though for a particular run only). However, switches and may have to provide different sets of test data software architecture recovery techniques. o1 ! ! lowing base relationships can be derived from the sparse case study with C code, object code names were identical do not handle variable accesses. Hence, not all required nents, mostly initialization and GUI code and very basic shapes, which would have been an interesting finding in relationships between components and features by means posium on the Foundations of Software Engineering, pp. 99- only done in very late phases in moving toward a product process for feature-based reengineering toward product to components, i.e., our technique primarily aims at func- in Table 1 can be represented as a directed acyclic graph variable is required when it is accessed (used or derived: immediately obvious in the concept lattice as it is located nodes are very basic components of the lowest regions of Chen and Rajlich’s technique could be helpful in a later to invoke a feature. However, in order to find suitable test o2 ! ! ! representation of the lattice (note the duality in the inter- to source names. If this is not the case, one either tolerates functions, and was too large to be drawn accordingly; as a terms of reuse. of concept analysis, whereas Wilde and Scully’s technique 110, November, 1994. family. Reverse engineering can also assist in earlier families can be described as follows: tional features. However, in some cases, it is possible to whose nodes represent concepts and whose edges denote Components will be considered objects of the formal changed); a composite component is required when • If γ(c1) < γ(c2) holds for two components c1 and c2, low-level components are detected. in the middle level of the lattice. the lattice, which indicates that ellipses and circles are phase, in which the system needs to be more rigorously data, one might need some knowledge on internal details References o3 ! ! ! ! ! context, whereas features will be considered attributes. pretation): divergences between names (mostly, names are similar The resulting concepts contain subprograms grouped comparison point: the text drawing concept, marked as only localizes a feature. The derived relationships are an [16]Staudenmayer, N.S. and Perry, D.E., ‘Session 5: Key Tech- phases and, thus, Bayer et al. rightly demand an early inte- 1. The economically relevant features are ascertained by isolate non-functional aspects, like security, in code and the superconcept/subconcept relation < as shown in one of its parts is required. The framed area in Figure 4 has a simpler structure widely separate from all other objects. General observations. We made the experience that analyzed. The purpose of our technique is to derive the of a system. o4 ! ! ! ! ! ! Note that in the reverse case, the concept lattice is simply then component c2 requires component c1. enough) or has to reverse name mangling. together according to their usage for features. Note that the node #1, has 29 components). As Figure 4 shows, there are import information to product family experts and represent [1] Bayer, J., Girard, J.-F., Würthner, M., Apel, M., and DeBaud, niques and Process Aspects for Product Line Development’, gration of reverse engineering into a product family product family engineers and market analysts. map them to specific components. For instance, one could Figure 2. The most general concept is called the top ele- In order to obtain the relation, a set of usage scenarios • A component, c, is required for all features at and than the rest of the lattice. This part deals with circles and applying our method is easy in principle. However, run- feature component map. It handles the system as a black The implementation of this technique was surprisingly J.-M., ‘Transitioning Legacy Assets - a Product Line Proc. of the 10th International Software Process Workshop, • If µ(f1) < µ(f2) holds for two features f1 and f2, then a few concepts containing most of the components (i.e., Second experiment. In a second experiment, we analyzed additional dependencies that need to be considered in a approach [1]. Early reverse engineering is needed to derive concentrate all network accesses in one single component Table 1: Example relation. ment and is denoted by . The most special concept is inverted but the derived information will be the same. needs to be prepared where each scenario exploits prefera- above γ(c) – as defined by (1) – in the lattice. more general subprograms can be found at the lower con- ellipses and its details are shown in Figure 6. Each node, ning all scenarios by hand is time consuming. It may be box and, hence, does not give insights in internal aspects simple. We opportunistically put together a set of publicly Approach’, Proceedings of the SIGSOFT Foundations of June 1996, Ventron FR. 2. The feature component map is derived based on the called the bottom element and is denoted by ⊥ . The set of relevant features will be determined by the 4. Case Study cepts in the lattice since they are used for many features, subprograms) of the system. The lattice contains 47 con- the edit mode rotate which comes in two variants: clock- decision for certain features and components. first coarse information on existing system components identified relevant features. to enable controlled secure connections. A pair (O, A) is called concept if A = σ ( O ) ∧ O = τ ( A ) bly only one relevant feature. Then the system is used • A feature, f, requires all components at and below µ(f) feature f1 is based on feature f2. cepts. 26 of them introduce at least one new component, N, in Figure 6 contains two sets: The upper set contains all wise and counterclockwise. The first ten shapes in facilitated by the presence of test cases that allow an auto- with respect to quality and effort. available tools and wrote a few Perl scripts (140 LOC in Software Engineering, Toulouse, pp. 446-463, Association of [17]Wilde, N. and Scully, M.C., ‘Software Reconnaissance: (assets) timely needed by a product family analyst to The remainder of this article is organized as follows. The combination of the graphical representation in product family experts. For components, we can consider while specific components are in the upper region of the mated replay of various scenarios. Wilde and Scully [17] also use dynamic analysis to Computing Machinery (ACM), 1999. according to the set of usage scenarios, one at a time, and – as defined by (2) – in the lattice. One has to note that the latter relationship between fea- As a case study, we analyzed the Xfig system [18] components attached to the node, i.e., those components, total) for interoperability, which took us just one day. A Mapping Program Features to Code’, Software Maintenance: investigate feasibility and to estimate costs of different 3. The previously derived feature component map gives Section 2 introduces concept analysis. Section 3 explains holds, i.e., all objects share all attributes. For a concept c = Figure 2 and the contents of the concepts in Table 2 the following alternatives depending on how much knowl- lattice. Hence, the concept lattice also reflects the level of i.e., to these nodes, a component is attached (more pre- c, for which γ(c) = N; the lower set contains all features of Figure 5 were drawn and rotated once clockwise and once Because Xfig has a GUI, running a single scenario by localize features as follows: 6. Conclusions drawback of our simple implementation is that one has to [2] Bosch, J., ‘Product-Line Architectures in Industry: A Case Research and Practice, vol. 7, pp. 49-62, 1995. (O, A), O is the extent of c, denoted by extent(c), and A is the execution traces are recorded. An execution trace con- • A component, c, is specific to exactly one feature, f, if tures safely holds for the analyzed system only, i.e., this (version 3.2.1) consisting of about 76 KLOCs written in cisely, a concept C introduces a component if there exists a counterclockwise, which resulted in 20 scenarios. The alternative ways to get to a suitable product family archi- additional insights into dependencies among features how concept analysis can be used to derive the feature together form the concept lattice. The complete informa- edge on the system architecture is already available: abstraction of these subprograms within the given set of N, including those inherited from other concepts. The hand is an easy task. However, one has to pay attention not run the system for each usage scenario from the beginning Study’, Proc. of the 21st International Conference on Soft- [18]Xfig system, http://www.xfig.org. tains all required low-level components for a usage sce- f is the only feature on all paths from γ(c) to the top relationship is not necessarily true for the features as such, the programming language C. In this section, we will 1. The invoking input set I (i.e., a set of test cases or – in A feature component map describes which components ware Engineering (ICSE’99), (Los Angeles, CA, USA), pp. 2 3 4 5 6 7 8 9 10 1
  • 13.
    Unfortunate y therev ewers d d not ke the paper so much t was accepted on y as a short paper We took the comments of the rev ewers ser ous mproved the paper and added more case stud es t was a comp ete re-wr te but the essent a dea surv ved Derivation of Feature Component Maps by means of Concept Analysis Thomas Eisenbarth, Rainer Koschke, Daniel Simon University of Stuttgart, Breitwiesenstr. 20-22, 70565 Stuttgart, Germany {eisenbts, koschke, simondl}@informatik.uni-stuttgart.de and components and, hence, into feasibility and costs of different alternative product family platforms. The knowledge gained from the feature component map and additional economic considerations may lead to a further selection of only a certain subset of all features and their corresponding components. component map and Section 4 describes our experience with this technique in a case study. Section 5 discusses related research. 2. Concept Analysis the intent of c, denoted by intent(c). Informally, a concept corresponds to a maximal rectan- gle of filled table cells modulo row and column permuta- tions. For example, Table 2 contains the concepts for the relation in Table 1. C3 C4 C7 C1 C2 C6 C5 < 1. cohesive modules and subsystems as defined and doc- umented by the system’s architects or re-gained by re- engineers; modules and subsystems will be consid- ered composite components in the following; 2. physical modules, i.e., modules as defined by means of the underlying programming language or simply nario or an invoked feature, respectively. If composite components are used for concept analysis, the execution trace containing the required low-level components induces an execution trace for composite components by replacing each low-level component with the composite component to which it belongs. Hence, each system run element. • A feature, f, is specific to exactly one component, c, if c is the only component on all paths from µ(f) to the bottom element (i.e, c is the only component required to implement feature f). • Features, to which two components, c1 and c2, jointly because the relationship was derived only from a specific implementation. The information described above can be derived by a tool and fed back to the product family expert. As soon as a decision is made re-use certain features, all components required for these features (easily derived from the con- pw_arcbox draw_arcbox arcbox_drawing_selected boxsize_msg resizing_box elastic_box erase_box_lengths init_box_drawing box_drawing_selected draw−rectangle.mon draw−polyline.mon draw−polygone.mon set_latestline redisplay_line last_line add_line 6 list_add_line 5 create_lineobject line_drawing_selected free_points resizing_poly elastic_poly regpoly_drawing_selected set_latestspline draw_spline redisplay_spline last_spline 3 add_spline list_add_spline create_spline create_sfactor spline_bound spline_drawing_selected make_sfactors elastic_moveline 4 cancel_line_drawing compute_angle set_latestarc redisplay_arc last_arc add_arc list_add_arc draw_arc create_arc arc_bound 2 compute_direction compute_arccenter arc_drawing_selected XRotDrawString draw_shift_mousefun_canvas clear_mousefun_kbd draw_mousefun_kbd check_cancel textsize pw_text lookfont set_latesttext in_text_bound text_search last_text add_text 1 redisplay_text toggle_textmarker list_add_text x_fontnum draw_text new_string create_text 41 resizing_ebr elastic_ebr ellipsebyradius_drawing_selected 40 39 44 resizing_cbd elastic_cbd circlebydiameter_drawing_selected 38 pw_curve set_latestellipse redisplay_ellipse center_marker last_ellipse add_ellipse list_add_ellipse 42 resizing_ebd elastic_ebd ellipsebydiameter_drawing_selected 43 esizing_cbr elastic_cbr circlebyradius_drawing_selected component c for which γ(c) = C holds). 21 of the concepts do not introduce any new component and merely merge functionality needed by several superconcepts. The first interesting observation is that concepts with many components can be found in the upper region, while in the lower region, the number of components decreases names of the features correspond to the objects drawn via the panel in Figure 5; e.g., draw-ellipse-radius means that an ellipse was drawn where the radius was specified (as opposed to the diameter). resulting lattice contained 55 concepts, most of them intro- duce no new component. We observed that the related shapes, i.e., the variants of splines, circles, ellipses, etc., were merged at the top of the lattice since they use almost the same components. In order to reduce the size of the lattice, we selected one representative among the related to cause interferences by invoking irrelevant features. For instance, Xfig uses a balloon help facility that pops up a little window when the cursor stays some time on a sensi- tive area of the GUI (e.g., over the button selecting the cir- cle drawing mode). Sometimes the balloon help mechanism triggers, introducing interferences between our terminology – a set of usage scenarios) is identi- fied that will invoke a feature. 2. The excluding input set E is identified that will not invoke a feature. 3. The program is executed twice using I and E sepa- rately. are required to implement a particular feature and is needed at an early stage within a process toward a product family platform • to weigh alternative platform architectures, • to aim further tasks – like quality assessment – to only those existing components that are needed to populate to get an execution trace for each feature. A more sophisti- cated environment would allow to start and end recording traces at any time. Our implementation only counts subprogram calls and ignores accesses to global variables and single statements or expressions. It might be useful to analyze at a finer 544-554, May 1999 [3] Brandenburg, F.J., ‘Graphlet’, Universität Passau, http://www.infosun.fmi.uni-passau.de/Graphlet/. [4] Canfora, G., Cimitile, A., De Lucia, A., and Di Lucca, G.A., ‘A Case Study of Applying an Eclectic Approach to Identify Objects in Code’, Workshop on Program Comprehension, pp. We subm tted the new paper to CSM and t rece ved the best paper ({o1, o2, o3, o4}, ∅) draw_line create_ellipse text_bound 136-143, Pittsburgh, 1999, IEEE Computer Society Press. get_intermediatepoint ellipse_bound Concept analysis is a mathematical technique that pro- C1 create_line erase_char_string init_trace_drawing yields all required components for a single scenario that cept lattice) form a starting point for further analyses to 7 and the number of interferences increases (an interference shapes and re-run the experiment with three shapes features. Such effects affect the analysis because they granularity when subprograms are interleaved, i.e., differ- line_bound draw_ellipse char_handler 4. The selected components are more closely analyzed, directly available as existing files (the distinction to draw_char_string contribute, can be identified by γ(c1) ∧ γ(c2); graphi- award Figure 2. Concept lattice for Table 1. unconstrained_line finish_text_input text_drawing_selected 4. By comparison of the two resulting execution traces, the platform architecture, vides insights into binary relations. The mathematical C2 ({o2, o3, o4}, {a3, a4}) exploits one feature. Thus, a single column in the relation investigate quality (like maintainability, extractability, and elastic_line 32 mouse_balloon print_to_file leads to an unstructured lattice; a lattice is said to be struc- (ellipse, polygon, and open approximated spline). The introduce spurious connections between features. Fortu- ent strands of control with different functionality are [5] Chen, K. und Rajlich, V., ‘Case Study of Feature Location Abstract tecture. for instance, with respect to maintainability, extract- tion can be visualized in a more readable equivalent way cohesive modules is that one does not know a priori XRotDrawAlignedImageString Using Dependence Graph’, Proc. of the 8th Int. Workshop on the components can be identified that implement the • and to decide on further steps, like reengineering or erase_lengths foundation of concept analysis was laid by Birkhoff in table can be obtained per system run. Applying all usage cally depicted, one ascertains in the lattice the closest integrability) and to estimate effort for subsequent steps see Figure 6 tured if it can be decomposed into independent sublattices resulting lattice is shown in Figure 7. nately, this problem can be partly fixed by providing a spe- united in a single subprogram, possibly for efficiency rea- ability, and integrability. by marking only the graph node with an attribute a ∈ A whether physical modules really group cohesive dec- append_point One important piece of information for a product fam- Program Comprehension, pp. 241-249, June 10-11, 2000, create_point clip_arrows 1940. It has already been successfully used in other fields C3 ({o1}, {a1, a2}) common node toward the top element starting at the feature. wrapping. Feature component maps describe which components whose represented concept is the most general concept larations; physical modules are the unscrutinized scenarios provides the relation table. (wrapping, reengineering, or re-development from node altlength_msg concept that are connected via the top and bottom elements only). cific scenario in which only the accidentally invoked sons. For instance, we have found a subprogram in our Limerick, Ireland, IEEE Computer Society Press. ily analysis that tries to integrate existing assets is the so- 5. A product family platform is designed. Alternatives of software engineering. The binary relation in our specific nodes to which c1 and c2, respectively, are attached; 8: {arrow_bound} mode_balloon node length_msg draw_mousefun_topruler are needed to implement a particular feature and are used C4 ({o2, o4}, {a3, a4, a5}) result of a programmer’s way of grouping declara- An execution trace can be recorded by a profiler. How- scratch). 4: {create_mouse That is to say that there are many specific operations and irrelevant feature is invoked, which leads to a refactored Wilde and Scully focus on localizing rather than deriv- The technique presented in this paper yields the feature case study that draws different kinds of objects. The func- called feature component map that describes which com- for components to populate the product family plat- application of concept analysis to derive the feature com- that has a in its intent. Analogously, a node will be marked cmd_balloon} The taller a concept is, the more [6] Graudejus, H., Implementing a Concept Analysis Tool for early in processes to develop a product family based on ever, most profilers only record subprogram calls but not all features at and above this common node are those few shared operations and also that shared operations are concept lattice that contains a new concept that isolates the ing required components: For deriving all required compo- component map automatically using the execution traces tion contained a large switch statement whose branches with an object o ∈ O if it represents the most special con- tions whether it makes sense or not); {setup_ind_panel ponents are needed to implement a particular feature. A form are weighed: component extraction and reengi- C5 ({o3, o4}, {a3, a4, a6, a7, a8}) set_line_stuff components it contains. Identifying Abstract Data Types in C Code, master thesis, 3.3. Implementation set_cursor existing components. This paper describes a new tech- ponent map states which components are required when a accesses to variables. Instead of using a symbolic debug- jointly implemented by these components. create_bitmaps Figure 4. Lattice for the first experiment really used for many features. irrelevant feature and its components. In our example, nents, the execution trace for the including input set is for different usage scenarios. The technique is based on drew the specific kinds of objects. In the execution trace, cept that has o in its extent. The unique element µ in the University of Kaiserslautern, Germany, 1998. process_pending redisplay_zoomed_region feature is a realized (functional as well as non-functional) neering, new development, integration of COTS, or feature is invoked. This section describes concept analysis C6 ({o4}, {a3, a4, a5, a6, a7, a8}) 3. subprograms, i.e., functions and procedures, and glo- ... main 2 3 nique to derive the feature component map and additional concept lattice marked with a is therefore: ger, for example, that allows to set watchpoints on variable • Components jointly required for two features, f1 and Concept #1 in Figure 4 is the largest concept (exclud- 1 interferences due to an accidentally invoked irrelevant fea- sufficient. By subtracting all components in the execution concept analysis, a mathematical sound technique to ana- this subprogram showed up for all objects where in fact [7] Lindig, C. and Snelting, G., ‘Assessing Modular Structure of requirement (the term feature is intentionally weakly wrapping. in more detail. bal variables of the system; subprograms and global The implementation of the described approach is sur- firstly present a general overview of the results and sec- scenarios. To identify all subprograms required for a sin- (∅, {a1, a2, a3, a4, a5, a6, a7, a8}) accesses, or even to instrument the code if no sophisticated ing the bottom element). It exploits a single feature “draw ture appeared only at the two layers directly on top of the trace for the excluding input set from those in the execu- lyze binary relations, which has the additional benefits to only specific parts of it were actually executed. ∨ { c ∈ L(C ) a ∈ intent ( c ) } dependencies utilizing dynamic information and concept C7 f2, are described by µ(f1) ∨ µ(f2); graphically Legacy Code Based on Mathematical Concept Analysis’, defined because its exact meaning depends on the specific 6. A migration plan is prepared. Concept analysis is based on a relation R between a set variables will be called low-level components in the prisingly simple (if one already has a tool for concept analysis. The method is simple to apply, cost-effective, µ(a) = (1) profiler is available, one can also use a simple static ondly go into further details for particular interesting gle feature or a set of features, one can then analyze the text object”. According to the lattice, the feature is largely bottom element of the lattice, and could be more or less tion trace for the invoking input set, only those compo- reveal not only correspondences between features and Furthermore, the success of the described approach Proc. of the Int. Conference on Software Engineering, pp. context). Components are computational units of a soft- Table 2: Concepts for Table 1. following. depicted, one ascertains in the lattice the closest com- largely language independent, and can yield results The technique described in this article is used to derive of objects O and a set of attributes A, hence R ⊆ O × A. The unique element γ marked with object o is: dependency analysis: One considers all variables directly analysis). Our prototype for a Unix environment is an observations. concept lattice as described in Section 3.2. independent from other features and shares only a few ignored. nents remain that specifically deal with the feature. components but also dependencies between features and heavily depends on the clever choice of usage scenarios 349-359, Boston, 1997. ware architecture (see Section 3.1). Because the feature The tuple C = (O, A, R) is called formal context. For a The set of all concepts of a given formal context forms Ideally, one will use alternative (1) when reliable and mon node toward the bottom element starting at the opportunistic integration of the following parts: ∧ { c ∈ L(C ) o ∈ extent ( c ) } quickly and very early in the process. the feature component map which plays a central role and statically accessed for each executed subprogram also Xfig is a menu-driven tool that allows the user to draw First experiment. In our first experiment, we prepared 15 components with other features. Note that our technique achieves the same effect by between components (feature-feature dependencies are and the combination of them. Scenarios that cover too [8] Lindig, C., Concepts, component map is needed very early to trade off alterna- set of objects, O ⊆ O, the set of common attributes, σ, is a partial order via: γ (o) = (2) complete documentation exists. However, if cohesive nodes to which f1 and f2, respectively, are attached; all • Gnu C compiler gcc to compile the system using a ftp://ftp.ips.cs.tu-bs.de/pub/local/softech/misc. early in this process. to be dynamically accessed (all transitively accessed vari- and manipulate objects interactively under the X Window scenarios. Each scenario invokes Xfig, performs the draw- Concept #5 represents the two features “draw polyline” 4 5. Related Research considering several execution traces for different features derived from an existing system and, hence, may only much functionality in one step or the clumsy combination tives in good time, complete and hence time-consuming defined as: modules and subsystems are not known in advance, one components at and below this common node are those command line switch for generating profiling infor- 1. Introduction reverse engineering of the system is out of the question. In Overview. The technique described here is based on the ( O 1, A 1 ) ≤ ( O 2, A 2 ) ⇔ O 1 ⊆ O 2 or equivalently with We will call a graph representing a concept lattice ables will automatically be considered because all exe- System. Objects can be lines, polygons, circles, rectangles, ing of one of the objects Xfig provides, and then termi- and “draw polygon”. The only difference between these at a time. Components not specific to a feature will “sink” exist for this particular system but not necessarily for these of scenarios will result in huge and complex lattices that [9] Koschke, R., ‘Atomic Architectural Component Recovery for would hardly make the effort to analyze a large system to jointly required for these features. mation, The mathematical foundation of concept analysis was Program Understanding and Evolution’, Dissertation, Institut σ ( O ) = { a ∈ A ∀( o ∈ O ) ( o, a ) ∈ R } using this marking strategy a sparse representation. The cuted subprograms are examined). In practice, this splines, text, and imported pictures. An interesting first nates Xfig, i.e., the aspects above were not combined and two features is that an additional line is drawn that closes a in the concept lattice, i.e., will be closer to the bottom ele- features in general). are unreadable for humans. Moreover, the number of particular, the decision for a certain alternative will lead to execution traces generated by a profiler for different usage ( O 1, A 1 ) ≤ ( O 2, A 2 ) ⇔ A 1 ⊇ A 2 . obtain these in order to apply concept analysis to get the laid by Birkhoff in 1940. Primarily Snelting has recently für Informatik, Universität Stuttgart, 2000, Developing similar products as product families prom- equivalent sparse representation for Figure 2 is shown in analysis may be a sufficient approximation. But one • Components required for all features can be found at • Gnu object code viewer nm and a short Perl script in task in our case study was to define what constitutes a fea- polygon. This difference is not visible in the concept lat- ment. More precisely, recall from Section 3.2 that a com- The technique is primarily suited for functional fea- usage scenarios increases tremendously when features are a consolidation on specific economically important core scenarios (see Figure 1). One scenario represents the invo- Analogously, the set of common objects, τ, for a set of no other functionality of Xfig was used. We used all http://www.informatik.uni-stuttgart.de/ifi/ps/rainer/thesis. ises several advantages over relatively expensive separate If c1 ≤ c2 holds, then c1 is called a subconcept of c2 Figure 3. The content of a node N in this representation feature component map because it not yet clear which should be aware that it may overestimate references the bottom element. order to identify all functions of the system (as ture. Clearly, the capability to draw specific objects, like shapes of Xfig’s drawing panel shown in Figure 5 except tice since the two features are attached to the same con- introduced concept analysis to software engineering. Since ponent, c, is specific to exactly one feature, f, if f is the tures that may be mapped to components. In particular combined. components in many cases and hence to an exclusion of cation of one single feature and yields all subprograms attributes, A⊆ A, is defined as: 5 [10]Krone, M. and Snelting, G., ‘On the Inference of Configura- developments, like lesser costs and shorter time for devel- executed for this feature. These subprograms identify the and c2 is called superconcept of c1. For instance, can be derived as follows: components are relevant at all and reverse engineering of because variable accesses may be included that are on • Features that require all components can be found at opposed to those included from standard libraries), lines, splines, rectangles, etc., can be considered a feature picture objects and library objects. cept. The distinction is made in the body of the function then it has been used to evaluate class hierarchies [15], only feature on all paths from γ(c) to the top element. non-functional features do not easily map to components. In our case study, the method provided us with valuable opment, test, and maintenance. These advantages are less important components. Any investment in a deep and τ ( A ) = { o ∈ O ∀( a ∈ A ) ( o, a ) ∈ R } the complete system first will likely not be cost-effective. of Xfig. Moreover, one can manipulate drawn objects in explore configuration structures of preprocessor state- tion Structures From Source Code’, Proc. of the Int. Confer- ({o2, o4}, {a3, a4, a5}) ≤ ({o2, o3, o4}, {a3, a4}) is true in paths not executed at runtime, and it will also ignore refer- the top element. that is called to draw either a polygon or a polyline. Con- One may argue that components that are only required For example, for applications for which timing is critical insights. The lattice revealed dependencies among features So young researchers never g ve up! costly pre-analysis of less important components would be components (or are themselves considered components) • the objects of N are all objects at and below N, • Gnu profiler gprof and a short Perl script to ascertain ence on Software Engineering, pp. 49-57, May 1994, IEEE based on the fact that all family members share a common In Section 3.1, the formal context for applying concept Only later, if the retrieved feature component map (using ences to variables by means of aliases if the simple static different edit modes (rotate, move, copy, scale, etc.) with cept #3 denotes the feature “draw spline”. Concept #4 has ments [10, 14], and to recover components [4,7,12,13]. to get the system started, but are not – strictly speaking – (because it may result in diverging behavior), the features for the Xfig implementation and the absence of such Computer Society Press. in vain to a large degree. Instead, reverse engineering in required for a certain feature. The required components for Table 2. • the attributes of N are all attributes at and above N. • If the top element does not contain features, then all the executed functions in the execution trace, infrastructure – also known as platform architecture. There analysis to derive the feature component map will be laid simpler definitions of components, like those in (2) or (3)) dependency analysis does not take aliasing into account. Xfig. Hence, we considered as main features the following circle by radius circle by diameter no feature attached and represents the components shared For feature localization, Chen and Rajlich [5] propose a directly necessary for any feature will still appear in the would also have to take time into account. dependencies, respectively; e.g., the abilities to draw text early phases should give information on the feature com- all scenarios and the set of features are then subject to con- The set of all concepts of a given formal context and components in the top element are superfluous (such • concept analysis tool concepts [8], [11]Perry, D., ‘Generic Architecture Descriptions for Product are many approaches to newly developing product families down as follows; For instance, the node in Figure 3 marked with o2 and clearly shows which lower-level components should be For a first analysis to obtain a simplified feature compo- two capabilities: for drawing polygons, polylines, and splines. These com- 6 semi-automatic method, in which an analyst browses the concept lattice when we do not subtract execution traces Note also that the technique is not suited for features and circles/ellipses are widely independent from other ponent map quickly and with simple means. To this end, cept analysis. Concept analysis gives information on rela- the partial order ≤ form a complete lattice, called concept investigated further to obtain composite components, components will not exist when the set of objects for ellipse by radii ellipse by diameters Figure 6. Relevant parts for Lines’, Proc. of the Second International ESPRIT ARES from scratch [2, 11]. However, according to Martinez [16], • components will be considered objects, a5 is the concept ({o2, o4}, {a3, a4, a5}). nent map, one can also ignore variables and come back to • graph editor Graphlet [3] to visualize the concept lat- 1. ability to draw different shapes (lines, curves, rectan- ponents are no real drawing operations but operations to circles and ellipses statically derived dependency graph; navigation on that for an excluding input set. It is true that these components that are only internally visible, like whether a compiler shapes. Related features were grouped together in the con- Workshop, Lecture Notes in Computer Science 1429, pp. 51- the product line analyst imparts all relevant features, for tionships between features and required components as lattice L: reverse engineering may generally pay off (in order to concept analysis contains only components executed tice, closed approx. spline approximated spline Figure 7. Concept lattice for second experiment. graph is computer-aided. Since the analyst more or less 56, Springer, 1998 most successful examples of product families at Motorola a3, a4 these in a later phase using more sophisticated dynamic or gles, etc.) keep a log of the points set by the user and to draw lines Nodes #41, #42, #43, and #44 represent the features to cannot be distinguished from components that in fact con- uses a certain intermediate representation. Strictly speak- cept lattice, which allowed us to compare our mental which the necessary components need to be detected, to well as feature-feature and component-component depen- • features will be considered attributes, O A detect cohesive modules, we have developed a semi-auto- at least once, which is the case if a filter ignores all This lattice consists of 22 concepts, three of them pro- takes on all the search, this method is less suited to quickly originated in a single separate product. Only in the course dencies. L(C) = { ( O, A ) ∈ 2 × 2 A = σ(O) ∧ O = τ( A)} a1, a2 static analyses. subprograms for which the profiler reports an execu- • and two more short Perl scripts to convert the file for- closed interpol. spline interpolated spline between set points while the user is still setting points (a draw circles and ellipses using either diameter or radius. tribute to all components because both kinds of compo- ing, internal features may be viewed as implementation model of a drawing tool to the actual implementation of [12]Sahraoui, H., Melo. W, Lounis, H., and Dumont, F. (1997), the reverse engineer who in turn delivers the feature com- • a pair (component c, feature f) is in relation R if c is a5 a6, a7, a8 matic method integrating many automatic state-of-the-art 2. ability to modify shapes in different editing modes vide the specific functionality for the respective shapes. and cheaply derive the feature component map. Moreover, ‘Applying Concept Formation Methods to Object Identifica- of time, a shared architecture for a product family evolved. mats of concepts and Graphlet (all Perl scripts polygon polyline spline first appears as polygon and is only re-shaped when They all contain three specific components to draw the nents jointly appear at the bottom element. However, the details. However, such implementation details may be of Xfig. The lattice also classified components according to ponent map. On the basis of the feature component map executed when f is invoked. The infimum of two concepts in this lattice is com- o1 o2 o3 techniques [9]). tion count of 0). (rotate, move, copy, scale, etc.) Concept #1 (21 functions) depicts the functionality for tion in Procedural Code’, Proc. of the Conference on Auto- Moreover, large investments impose a reluctance against feature F 3.2. Interpretation of the Concept Lattice together have just 147 LOC). rectangular box the user has set all points). object, to plot an elastic bend while the user is drawing, the method relies on the quality of the static dependency idea of an excluding input set can be taken over to our interest for defining a product family architecture. Internal their abstraction level, which is a useful information for and additional economic reasons, a decision is made for puted by intersecting their extents as follows: o4 rectangular box splines and concept #2 (17 functions) represents the one mated Software Engineering, Nevada, pp. 210-218, introducing a product family approach that ignores exist- usage scenario However, here – for the time being – we will use as an < Alternative (2) can be chosen if suitable documentation • If the bottom element does not contain any compo- The fact that the subprograms are extracted from the We conducted two experiments. In the first one, we with rounded corners Concept #2 stands for the feature “draw arc” and con- and to resize the object. Note the similarity of the compo- graph. If this graph, for example, does not contain infor- technique to distinguish these two kinds of components by features can only be detected by looking at the source, re-use; general components can be found at the lower November, IEEE Computer Society. ing assets. Hence, an introduction of a product family particularly interesting and required components, and fur- abstract example the binary relation between arbitrary ( O 1, A 1 ) ∧ ( O 2, A 2 ) = ( O 1 ∩ O 2, σ ( O 1 ∩ O 2 ) ) is not available but there is reason to trust the program- Concept analysis applied to the formal context nent, all features in the bottom element are not imple- investigated the ability to draw different shapes only. In regular polygon arc for lines (used for polygons). Both are dependent on con- mation on potential values of function pointers, the human execution trace Figure 3. Sparse representation of Figure 2. object code makes the implementation independent from cept #7 is again a concept that represents shared compo- nent names. The specific commonalities among circles and providing a usage scenario in which no feature is invoked, because it is not clear how to invoke them from outside level, specific components at the upper level. Moreover, ther expensive analyses regarding quality can be cost- mers of the system to a great extent. In all other cases, one described in the last section gives a lattice, from which mented by the system (this constellation will not the second one, we analyzed the ability to modify shapes. picture object text cept #4 (29 functions) that groups functions related to analyst may miss functions only called via function point- [13]Siff, M. and Reps, T., ‘Identifying Modules via Concept approach has generally to cope with existing code. required components C1 …Cn objects and attributes shown in Table 1. An object oi has The infimum describes a set of common attributes of the programming language to a great extent (as long as the nents for drawing elastic lines while the user is setting ellipses are represented by node #38, which introduces the like simply starting and immediately shutting down the and how to derive from an execution trace whether these the lattice showed dependencies among components, Analysis’, Proc. of the Int. Conference on Software Mainte- effectively aimed at selected components. will fall back on alternative (3). However, for alternative interesting relationships can be derived. These relation- exist, if there is a usage scenario for each feature and The second experiment exemplifies combined features library object points. Concept #3 (20 functions) denotes the ellipse fea- ers. At the other extreme, if the too conservative assump- Reverse engineering may help creating a product fam- attribute aj if row i and column j is marked with an ! in two sets of objects. Similarly, the supremum is deter- points. The difference between concept #7 and concept #4 shared components to draw circles and ellipses (both spec- system without invoking any relevant feature. That simple features are present or not. However, we assume that which need to be known when components are to be This paper describes a quickly realizable technique to (F, C1), …(F, Cn) ∈R 3. Feature Component Map (3), concept analysis may additionally yield hints on sets ships can be fully automatically derived and presented to every usage scenario is appropriate and relevant to the language is compiled to object code) and has the advan- composed by basic features. For the second experiment, a ture, concept #5 (29 functions) the general drawing sup- nance, Bari, pp. 170-179, October, 1997, IEEE Computer ily for existing systems by identifying and analyzing the mined by intersecting the intents: is that the former only contains the components to draw ified by diameter and radius). tion is made that every function whose address is taken is trick separates the two kinds of components in two distinct externally visible features are generally more important. extracted. Society. ascertain the feature component map based on dynamic concept analysis Table 1 (the example stems from Lindig and Snelting [7]). the analyst such that the more complicated theoretical system; a system may indeed not have all features, tage that no front end is necessary. On the other hand, Figure 5. Xfig’s object shapes. port functionality and concept #6 (123 functions) the start- components and also by deriving the individual architec- of related subprograms forming composite components. shape is drawn and then modified. Both draw and modify the elastic line, while the latter adds the capability to set an Nodes #32 and #39 connect the circles and ellipses to called at each function pointer call site, the search space concepts, C1 and C2, in the lattice where C1 < C2 and The invocation for externally visible features is com- As future work, we want to explore how results information (gained from execution traces) and concept feature component map For instance, the following equations hold for this table, ( O 1, A 1 ) ∨ ( O 2, A 2 ) = ( τ( A 1 ∩ A 2), A 1 ∩ A 2 ) In order to derive the feature component map via con- background can be hidden. The only thing an analyst has i.e., a usage scenario may be meaningless for a given because a compiler may replace source names by link The resulting lattice for this experiment is shown in up and initialization code of the system. [14]Snelting, G., ‘Reengineering of Configurations Based on ture from each system. These individual architectures may The relation for the formal context necessary for con- constitute a basic feature. Combined features add to the arbitrary number of points. Splines do not need this capa- the other objects. No components are attached to nodes increases extremely. Generally, it is statically undecidable paratively simple when a graphical user interface is avail- obtained by the method described in this paper may be Mathematical Concept Analysis’, ACM Transactions on Soft- analysis. The technique is automatic to a great extent. and dependencies also known as relation table: cept analysis, one has to define the formal context to know is how to interpret the derived relationships. This names in the object code (for instance, C++ compilers use Figure 4. The contents of the concepts in the lattice are Analyzing concepts #1, #2, and #3, we found that the C1= ⊥ and C2 contains only those components that are then be unified to a platform architecture and the derived The supremum ascertains the set of common objects, cept analysis is defined as follows: system). effort needed to derive the feature component map as there bility because they are defined by exactly three points. #32 and #39, they only merge components from different which paths are taken at runtime, so that every static anal- able (as it was the case in our case study). Then, usually combined with results of additional static analyses. For ware Engineering and Methodology 5, 2, pp. 146-189, April, Concept analysis is a mathematical technique to investi- σ ( { o 1 } ) = { a 1, a 2 } and τ ( { a 7, a 8 } ) = { o 3, o 4 } (objects, attributes, relation) and to interpret the resulting name mangling to resolve overloading) there is not always omitted for readability reasons. However, their size in this shapes provide individual rotate functions. In other words, really required for all components in a narrower sense. components may be used to populate the unified architec- Figure 1. Overview. which share all attributes in the intersection of two sets of (C, F) ∈ R if and only if component C is required section explains how interesting relationships can be auto- Beyond these relationships between components and are many possible combinations. ysis will yield an overestimated search space, whereas 1997. concept lattice accordingly. a direct mapping from the subprograms in the execution Concept #6 represents the feature “draw lines” and is concepts. The two nodes have a direct infimum (not shown Furthermore, our technique goes beyond Wilde and only a menu selection or a similar interaction is necessary. example, we want to investigate the relation between the gate binary relations (see Section 2). attributes. matically derived. In both experiments, we considered subprograms as picture is a linear function of their number of components the rotate feature is implemented specific to each shape, dynamic analyses exactly tell which parts are really used [15]Snelting, G. and Tip, F., ‘Reengineering Class Hierarchies ture. To this end, code needs to be adjusted, reengineered, We want to point out that not all non-functional when feature F is invoked; a subprogram is features, further useful aspects between features on one trace back to the original source. Because we dealt in our used for drawing rectangles, polygons, and polylines, as in Figure 6) and add the same components to the circle and In the case of a batch system, one may vary command line concept lattice based on dynamic information and static a1 a2 a3 a4 a5 a6 a7 a8 As already abstractly described in Section 2, the fol- components. However, in our simple implementation, we (except for the bottom element that contains 136 compo- i.e., there is no generic component that draws all different Scully’s technique in that it also allows to derive relevant Using Concept Analysis’, Proc. of the ACM SIGSOFT Sym- or wrapped. However, changing or wrapping the code is Integration into a Product Family Process. A simple requirements, e.g., time constraints, can be easily mapped Graphically, the concept lattice for the example relation 3.1. Context for Feature and Components required when it needs to be executed; a global hand and between components on the other hand may be one would expect. The generality of this feature becomes ellipse features. The components inherited via these two at runtime (though for a particular run only). However, switches and may have to provide different sets of test data software architecture recovery techniques. o1 ! ! lowing base relationships can be derived from the sparse case study with C code, object code names were identical do not handle variable accesses. Hence, not all required nents, mostly initialization and GUI code and very basic shapes, which would have been an interesting finding in relationships between components and features by means posium on the Foundations of Software Engineering, pp. 99- only done in very late phases in moving toward a product process for feature-based reengineering toward product to components, i.e., our technique primarily aims at func- in Table 1 can be represented as a directed acyclic graph variable is required when it is accessed (used or derived: immediately obvious in the concept lattice as it is located nodes are very basic components of the lowest regions of Chen and Rajlich’s technique could be helpful in a later to invoke a feature. However, in order to find suitable test o2 ! ! ! representation of the lattice (note the duality in the inter- to source names. If this is not the case, one either tolerates functions, and was too large to be drawn accordingly; as a terms of reuse. of concept analysis, whereas Wilde and Scully’s technique 110, November, 1994. family. Reverse engineering can also assist in earlier families can be described as follows: tional features. However, in some cases, it is possible to whose nodes represent concepts and whose edges denote Components will be considered objects of the formal changed); a composite component is required when • If γ(c1) < γ(c2) holds for two components c1 and c2, low-level components are detected. in the middle level of the lattice. the lattice, which indicates that ellipses and circles are phase, in which the system needs to be more rigorously data, one might need some knowledge on internal details References o3 ! ! ! ! ! context, whereas features will be considered attributes. pretation): divergences between names (mostly, names are similar The resulting concepts contain subprograms grouped comparison point: the text drawing concept, marked as only localizes a feature. The derived relationships are an [16]Staudenmayer, N.S. and Perry, D.E., ‘Session 5: Key Tech- phases and, thus, Bayer et al. rightly demand an early inte- 1. The economically relevant features are ascertained by isolate non-functional aspects, like security, in code and the superconcept/subconcept relation < as shown in one of its parts is required. The framed area in Figure 4 has a simpler structure widely separate from all other objects. General observations. We made the experience that analyzed. The purpose of our technique is to derive the of a system. o4 ! ! ! ! ! ! Note that in the reverse case, the concept lattice is simply then component c2 requires component c1. enough) or has to reverse name mangling. together according to their usage for features. Note that the node #1, has 29 components). As Figure 4 shows, there are import information to product family experts and represent [1] Bayer, J., Girard, J.-F., Würthner, M., Apel, M., and DeBaud, niques and Process Aspects for Product Line Development’, gration of reverse engineering into a product family product family engineers and market analysts. map them to specific components. For instance, one could Figure 2. The most general concept is called the top ele- In order to obtain the relation, a set of usage scenarios • A component, c, is required for all features at and than the rest of the lattice. This part deals with circles and applying our method is easy in principle. However, run- feature component map. It handles the system as a black The implementation of this technique was surprisingly J.-M., ‘Transitioning Legacy Assets - a Product Line Proc. of the 10th International Software Process Workshop, • If µ(f1) < µ(f2) holds for two features f1 and f2, then a few concepts containing most of the components (i.e., Second experiment. In a second experiment, we analyzed additional dependencies that need to be considered in a approach [1]. Early reverse engineering is needed to derive concentrate all network accesses in one single component Table 1: Example relation. ment and is denoted by . The most special concept is inverted but the derived information will be the same. needs to be prepared where each scenario exploits prefera- above γ(c) – as defined by (1) – in the lattice. more general subprograms can be found at the lower con- ellipses and its details are shown in Figure 6. Each node, ning all scenarios by hand is time consuming. It may be box and, hence, does not give insights in internal aspects simple. We opportunistically put together a set of publicly Approach’, Proceedings of the SIGSOFT Foundations of June 1996, Ventron FR. 2. The feature component map is derived based on the called the bottom element and is denoted by ⊥ . The set of relevant features will be determined by the 4. Case Study cepts in the lattice since they are used for many features, subprograms) of the system. The lattice contains 47 con- the edit mode rotate which comes in two variants: clock- decision for certain features and components. first coarse information on existing system components identified relevant features. to enable controlled secure connections. A pair (O, A) is called concept if A = σ ( O ) ∧ O = τ ( A ) bly only one relevant feature. Then the system is used • A feature, f, requires all components at and below µ(f) feature f1 is based on feature f2. cepts. 26 of them introduce at least one new component, N, in Figure 6 contains two sets: The upper set contains all wise and counterclockwise. The first ten shapes in facilitated by the presence of test cases that allow an auto- with respect to quality and effort. available tools and wrote a few Perl scripts (140 LOC in Software Engineering, Toulouse, pp. 446-463, Association of [17]Wilde, N. and Scully, M.C., ‘Software Reconnaissance: (assets) timely needed by a product family analyst to The remainder of this article is organized as follows. The combination of the graphical representation in product family experts. For components, we can consider while specific components are in the upper region of the mated replay of various scenarios. Wilde and Scully [17] also use dynamic analysis to Computing Machinery (ACM), 1999. according to the set of usage scenarios, one at a time, and – as defined by (2) – in the lattice. One has to note that the latter relationship between fea- As a case study, we analyzed the Xfig system [18] components attached to the node, i.e., those components, total) for interoperability, which took us just one day. A Mapping Program Features to Code’, Software Maintenance: investigate feasibility and to estimate costs of different 3. The previously derived feature component map gives Section 2 introduces concept analysis. Section 3 explains holds, i.e., all objects share all attributes. For a concept c = Figure 2 and the contents of the concepts in Table 2 the following alternatives depending on how much knowl- lattice. Hence, the concept lattice also reflects the level of i.e., to these nodes, a component is attached (more pre- c, for which γ(c) = N; the lower set contains all features of Figure 5 were drawn and rotated once clockwise and once Because Xfig has a GUI, running a single scenario by localize features as follows: 6. Conclusions drawback of our simple implementation is that one has to [2] Bosch, J., ‘Product-Line Architectures in Industry: A Case Research and Practice, vol. 7, pp. 49-62, 1995. (O, A), O is the extent of c, denoted by extent(c), and A is the execution traces are recorded. An execution trace con- • A component, c, is specific to exactly one feature, f, if tures safely holds for the analyzed system only, i.e., this (version 3.2.1) consisting of about 76 KLOCs written in cisely, a concept C introduces a component if there exists a counterclockwise, which resulted in 20 scenarios. The alternative ways to get to a suitable product family archi- additional insights into dependencies among features how concept analysis can be used to derive the feature together form the concept lattice. The complete informa- edge on the system architecture is already available: abstraction of these subprograms within the given set of N, including those inherited from other concepts. The hand is an easy task. However, one has to pay attention not run the system for each usage scenario from the beginning Study’, Proc. of the 21st International Conference on Soft- [18]Xfig system, http://www.xfig.org. tains all required low-level components for a usage sce- f is the only feature on all paths from γ(c) to the top relationship is not necessarily true for the features as such, the programming language C. In this section, we will 1. The invoking input set I (i.e., a set of test cases or – in A feature component map describes which components ware Engineering (ICSE’99), (Los Angeles, CA, USA), pp. 2 3 4 5 6 7 8 9 10 1 Derivation of Feature Component Maps by means of Concept Analysis grams a set of components C. A component corresponds to 3.2. Interpretation of the Concept Lattice 3.3. Implementation an object of the formal context, whereas a feature will be considered an attribute. Concept analysis applied to the formal context The implementation of the described approach is sur- Thomas Eisenbarth, Rainer Koschke, Daniel Simon described in the previous section gives a lattice, from prisingly simple (if one already has a tool for concept The relation R for the formal context necessary for con- University of Stuttgart, Breitwiesenstr. 20-22, 70565 Stuttgart, Germany cept analysis is defined as follows (where c ∈ C, f ∈ F): which interesting relationships can be derived. These rela- analysis). Our prototype for a Unix environment is an (c, f) ∈ R if and only if component c is required tionships can be fully automatically derived and presented opportunistic integration of the following parts: {eisenbts, koschke, simondl}@informatik.uni-stuttgart.de to the analyst such that the complicated theoretical back- when feature f is invoked; a subprogram is • Gnu C compiler gcc to compile the system using a com- required when it needs to be executed. ground can be hidden. The only thing an analyst has to mand line switch for generating profiling information, Abstract larly interesting and required components, and further know is how to interpret the derived relationships. expensive analyses can be aimed at selected components. R can be visualized using a relation table as shown in • Gnu object code viewer nm, Feature component maps describe which components The following base relationships can be derived from This paper describes a quickly realizable technique to Figure 1: the sparse representation of the lattice: • Gnu profiler prof, are needed to implement a particular feature and are used ascertain the feature component map based on dynamic early in processes to develop a product line based on exist- f1 f2 f3 f4 f5 f6 f7 f8 • A component, c, is required for all features at and above • concept analysis tool concepts [8], information (gained from execution traces) and concept ing assets. This paper describes a new technique to derive analysis. The technique is automatic to a great extent. c1 ! ! γ(c) in the lattice. • graph editor Graphlet [3] to visualize the concept lattice, the feature component map and additional dependencies c2 ! ! ! • A feature, f, requires all components at and below µ(f) in • and a short Perl script to ascertain the executed functions utilizing dynamic information and concept analysis. The The remainder of this article is organized as follows. c3 ! ! ! ! ! the lattice. in the execution trace and to convert the file formats of method is simple to apply, cost-effective, largely language Section 2 gives an overview, Section 3 explains how con- c4 ! ! ! ! ! ! concepts and Graphlet (the script has just 225 LOC). • A component, c, is specific to exactly one feature, f, if f independent, and can yield results quickly and very early cept analysis can be used to derive the feature component Figure 1. Relation Table is the only feature on all paths from γ(c) to the top ele- The fact that the subprograms are extracted from the in the process. map and Section 4 describes our experience with this tech- The resulting concept lattice is shown in Figure 2. We ment. object code makes the nique in an example. Section 5 references related research, use the sparse representation for visualization showing an • A feature, f, is specific to exactly one component, c, if c 1. Introduction Section 6 concludes the paper. attribute/feature at the uppermost concept in the lattice is the only component on all paths from µ(f) to the bot- Developing similar products as members of a product where it is required (so the attributes spread from this node tom element (i.e, c is the only component required to line promises advantages, like higher potential for reuse, 2. Overview down to the bottom). For a feature f, this node is denoted implement feature f). lesser costs and shorter time to market. There are many by µ(f). Analogously, a node is marked with an object/ • Features to which two components, c1 and c2, jointly The technique described here is based on the execution approaches to newly developing product lines from component c ∈ C in the sparse representation if it repre- contribute can be identified by γ(c1) ∧ γ(c2); graphically traces generated by a profiler for different usage scenarios. scratch [2, 10]. However, according to Martinez in [15], sents the most special concept that has c in its extent. This One scenario represents the invocation of one single fea- depicted, one ascertains in the lattice the closest com- most successful examples of product lines at Motorola unique node is denoted by γ(c). Hence, an object/compo- ture and yields all subprograms executed for this feature. mon node toward the top element starting at the nodes to originated in a single separate product. Only in the course nent c spreads from the node γ(c), to which it is attached, These subprograms identify the components. The required which c1 and c2, respectively, are attached; all features at of time, a shared architecture for a product line evolved. up to the top. components for all scenarios and the set of features are and above this common node are those jointly imple- Moreover, large investments impose a reluctance against then subject to concept analysis. Concept analysis gives f5 applies to f3, f4 mented by these components. introducing a product line approach that ignores existing information on relationships between features and these concepts assets. Hence, introducing a product line approach has f1, f2 f5 f6, f7, f8 • Components jointly required for two features, f1 and f2, required components as well as feature-feature and com- generally to cope with existing code. ponent-component dependencies. c1 c2 c3 c3 applies to are described by µ(f1) ∨ µ(f2); graphically depicted, one Reverse engineering helps creating a product line from c4 these concepts ascertains in the lattice the closest common node toward existing systems by identifying and analyzing the compo- Concept Analysis. Concept analysis is a mathematical the bottom element starting at the nodes to which f1 and nents and deriving the individual architectures. They can technique that provides insights into binary relations. The < concept Figure 2. Concept Lattice f2, respectively, are attached; all components at and then be unified to a product line architecture which is pop- mathematical foundation of concept analysis was laid by Birkhoff in 1940. The binary relation in our specific appli- In order to ascertain the relation table, a set of usage below this common node are those jointly required for ulated by the derived components. cation of concept analysis to derive the feature component scenarios needs to be prepared where each scenario trig- these features. As stated in Bayer et. al [1], early reverse engineering is needed to derive first coarse information on existing map states which components are required when a feature gers exactly one relevant feature1. Then the system is used • Components required for all features can be found at the assets needed by a product line analyst to set up a suitable is invoked. The detailed mathematical background of con- according to the set of usage scenarios. For each usage bottom element. product line architecture. cept analysis can be found in [7,13,14]. scenario, the execution trace is recorded. • Features that require all components can be found at the One important piece of information for a product line An execution trace contains all called subprograms for top element. analysis that tries to integrate existing assets is the so- 3. Feature Component Map a usage scenario or an invoked feature, respectively. The information described above can be derived by a called feature component map that describes which com- Hence, each system run yields all required components for tool and fed back to the product line expert. As soon as a ponents are needed to implement a particular feature. A In order to derive the feature component map via con- a single scenario that exploits one feature. A single col- decision is made to re-use certain features, all components feature is a realized (functional as well as non-functional) cept analysis, one has to define the formal context umn in the relation table can be obtained per system run. required for these features (easily derived from the con- requirement (the term feature is intentionally weakly (objects, attributes, relation) and to interpret the resulting Applying all usage scenarios provides the relation table. cept lattice) form a starting point for further static analyses defined because its exact meaning depends on the specific concept lattice accordingly. to investigate quality (like maintainability, extractability, context). Components are computational units of a soft- and integrability) and to estimate effort for subsequent 3.1. Context for Feature and Components ware architecture. steps (wrapping, reengineering, or re-development from On the basis of the feature component map and addi- 1. It is possible to combine multiple features into one scenario, mak- The set of relevant features F will be determined by the ing the interpretation of the resulting concept lattice more compli- scratch). tional economic reasons, a decision is made for particu- product line experts. We consider all the system’s subpro- cated. This is beyond the scope of this paper.
  • 14.
    The paper wasselected for a special issue of ICSM for TSE. Before we submitted the paper, we asked the editors for the page limit. They told us there was no limit.
  • 15.
    Locating Features inSource Code Thomas Eisenbarth, Rainer Koschke, and Daniel Simon 1 2 scenario * invokes * feature * implemented by basic block * computational unit routine module scenario Draw-circle-diameter Draw-circle-radius Move-circle Color-circle executed computational units draw, setDiameter draw, setRadius draw, setDiameter, move draw, setDiameter, color cases is to reveal errors, and hence test cases tend to be complex and to cover many features. Contrarily, scenarios for our feature location technique should be simpler and in- voke fewer features to differentiate the computational units more clearly. In order to explore variations of a feature, the domain 3 4 presented empirical data indicating that less expensive and—theoretically—less precise techniques to resolve func- tion pointers reach the precision of more expensive and— theoretically—more precise techniques [20] due to the com- mon way of using function pointers (as opposed to pointers to stack and heap objects). The set L of all concepts of a given formal context and the partial order ≤ form a complete lattice, called concept lattice: L(C) = {(O, A) ∈ 2O×2A | A = σ(O) and O = τ (A)} (5) o1 o2 a1 × a2 a3 a4 a5 × × × × a6 × a7 × × c1 c2 c3 c4 ({o1 , o2 , o3 }, {a7 }) ({o1 , o2 }, {a4 , a7 }) ({o1 , o3 }, {a6 , a7 }) ({o2 , o3 }, {a5 , a7 }) ({o1 }, {a1 , a4 , a6 , a7 }) 5 6 1. Scenario creation: Based on features (either known ini- tially or discovered during incremental analysis), the do- main expert creates scenarios. 2. Static dependency-graph extraction: The static depen- dency graph of the system under analysis is extracted. 3. Dynamic analysis: The system is used according to se- distinctive; that is, they should invoke all relevant features but as few other features as possible to ease the mappings from scenarios to features and from features to computa- tional units (often it is unavoidable to invoke features that are not of interest for the task at hand). The scenarios are documented for future use similarly to object Sect. III set of objects all objects attribute o O O a u U U s main part computational unit set of computational units all computational units scenario D.3 Basic Interpretation Concept analysis applied to the formal context described in the last section yields a lattice from which interesting relationships can be derived. These relationships can be fully automatically derived and presented to the analyst. 7 So we subm tted a paper w th 20 pages Fig. 3. Execution profiles for Fig. 2. o3 × × × × c5 ({o2 }, {a2 , a4 , a5 , a7 }) set of attributes A S set of scenarios Thus, the analyst has to know how to interpret the derived Fig. 1. Conceptual model in UML notation. The infimum ( ) of two concepts in this lattice is com- lected scenarios. The rev ewers asked us to add more deta and the vers on that was Abstract— Understanding the implementation of a certain a feature-oriented search focusing on the components of expert provides several scenarios, each triggering a feature (a)A formal context. c6 ({o3 }, {a3 , a5 , a6 , a7 }) test cases. Additionally, the documentation includes the all attributes A S all scenarios relationships, but does not need to be familiar with the feature of a system requires to identify the computational interest is needed. variation with a different set of input. To obtain effec- III. Formal Concept Analysis puted by intersecting their extents as follows: ⊥ (∅, {a1 , a2 , a3 , a4 , a5 , a6 , a7 }) 4. Interpretation of concept lattice: The data yielded by features invoked by the scenarios. If the domain expert also units of the system that contribute to this feature. In many incidence relation I I invocation table theoretical background of lattices. cases, the mapping of features to the source code is poorly This article describes a process and its supporting tech- specifically required for a feature, concept analysis addi- tive and efficient coverage, he builds equivalence classes of This section presents the necessary background informa- (b)Concepts for the formal context. the dynamic analysis is presented to and interpreted by the specifies the expected result of the scenario, the scenario The following base relationships can be derived from the niques to identify those parts of the source code which im- servable behavior of the system that can be triggered by analysis, similarly to Wilde and Scully’s technique [7]. If relevant input data. Identifying equivalence classes may (O1 , A1 ) (O2 , A2 ) = (O1 ∩ O2 , σ(O1 ∩ O2 )) (6) analyst. Relevant computational units are identified. may also be used as simple test case. documented. In this paper, we present a semi-automatic tionally allows to derive detailed relationships between fea- tion on formal concept analysis. Readers already familiar Fig. 4. An example relation between objects and attributes. The corresponding concepts that can be derived from the formal context are Fig. 7. Translation from the identifiers of Sect. III and the identifiers sparse representation of the lattice (note the duality): technique that reconstructs the mapping for features that plement a specific set of related features. The process is au- the user. the system is used as described by the scenario, the exe- require knowledge on internal details of a system. with concept analysis can skip to the next section. listed on the right. 5. Static dependency analysis: The analyst searches the used from here on, which instantiate formal concept analysis. tures and computational units. These relationships iden- The infimum describes a set of common attributes of • A computational unit u is required for all scenarios at are triggered by the user and exhibit an observable behavior. tomated to a large extent. It combines static and dynamic Example. Our fictitious drawing tool FIG (which re- cution trace lists the sequence of all performed calls for Formal concept analysis is a mathematical technique for system for additional computational units that are relevant C. Dynamic Analysis The mapping is in general not injective; that is, a com- tify computational units jointly required by any subset of Computational units. The exact notion of computational two sets of objects. Similarly, the supremum ( ) is de- and above γ(u) in the lattice; for instance, SetDiameter is analyses and uses concept analysis—a mathematical tech- sembles XFIG [5]) allows a user to draw, move, and color this scenario. Since our technique aims at only identify- analyzing binary relations. The mathematical foundation to selected features. The goal of the dynamic analysis is to find out which putational unit may contribute to several features. Our features and classify computational units as low-level or unit is a generic parameter to our technique and depends termined by intersecting the intents: The following subsections describe how to achieve these required for Draw-circle-diameter, Move-circle, and Color- technique allows to distinguish between general and specific nique to investigate binary relations—to derive correspon- different objects, such as rectangles, circles, ellipses, and so ing the computational units rather than at the order of the of concept analysis was laid by Birkhoff [21] in 1940. For The different roles of human resources for these activ- computational units contribute to a given set of features. high-level with respect to the given set of features. on the task and system at hand. In principle, there is no goals. The basic process of lattice interpretation is depicted circle according to Fig. 10. computational units with respect to a given set of features. dences between features and computational units. Concept forth. From the viewpoint of an analyst who is interested computational units’ execution, we need only the execution more detailed information on formal concept analysis we re- ities are (human resources are highlighted in the process Each feature is invoked by at least one of the prepared Example. Intersecting the execution profiles in Fig. 3 limit to the granularity of computational units: One could (O1 , A1 ) (O2 , A2 ) = (τ (A1 ∩ A2 ), A1 ∩ A2 ) (7) • A scenario s requires all computational units at and be- For a set of features, it also identifies jointly and distinctly analysis additionally yields the computational units jointly in the implementation of circle operations in FIG, the abil- profile. The execution profile of a given program run is fer to [22], where the mathematical foundation is explored. diagrams by a UML actor icon): in Fig. 9. required computational units. additionally shows that the computational units jointly re- ({o1 , o2 , o3 }, {a7 }) (∅, {a7 }) scenarios. low µ(s) in the lattice; for instance, Color-circle requires ity to draw, to move, and to color a circle are three relevant the set of computational units called during the run with- use basic blocks, routines, classes, modules, or subsystems. The presented technique combines dynamic and static and distinctly required for a set of features. quired for Draw-circle-diameter, Move-circle, and Color- Concept analysis deals with a relation I ⊆ O×A between The supremum yields the set of common objects, which • The analyst is the person interested in how features map The process that deals with the dynamic analysis is color, setDiameter, and draw according to Fig. 10. features. 2 out information about the order of execution. From the Subsystems as computational units are suitable to obtain ({o1 , o3 }, {a6 , a7 }) (∅, {a6 }) D.1 Scenario Selection analyses to rapidly focus on the system’s parts that re- An advantage of starting with features is that domain circle are draw and setDiameter, where draw is required a set of objects O and a set of attributes A. The tuple C = share all attributes in the intersection of two sets of at- onto source code. She interprets the concept lattice and shown in more detail in Fig. 8. The inputs to the process • A computational unit u is specific to exactly one scenario late to a specific set of features. Dynamic information is Every computational unit (excluding dead code) con- execution profile, we gather the fact that a computational an overview for very large systems. Considering routines, A number of execution profiles is selected in order to knowledge from the user’s perspective may be exploited, for all scenarios. 2 (O, A, I) is called a formal context. For a set of objects tributes. performs the static analysis. are source code and a set of scenarios created by process s if s is the only scenario on all paths from γ(u) to the gathered based on a set of scenarios invoking the features. tributes to the purpose of the system and thus corresponds unit has been executed at least once. We ignore the dura- methods, subprograms, etc. as computational units gives set up the context. Execution profiles may be recombined which is especially useful for external change requests and O ⊆ O, the set of common attributes σ(O) is defined as: • The domain expert designs the scenarios and lists the fina y accepted had 25 pages Rather than assuming a one-to-one correspondence between The information gained by concept analysis is used to The concept lattice for the formal context in Fig. 4(a) ({o1 , o2 }, {a4 , a7 }) ({o2 , o3 }, {a5 , a7 }) (∅, {a4 }) (∅, {a5 }) step 1 in Fig. 6. We proceed as follows: top element; for instance, color is specific to Color-circle to at least one feature—be it a very basic feature, such tion of the computational unit’s execution because compu- an overview at the global declaration level, whereas classes to analyze various aspects of a system, where execution features and scenarios as in earlier work, we can now handle error reports expressed in the terminology of a program’s guide a subsequent static analysis along the static depen- can be depicted as a directed acyclic graph whose nodes invoked features for each scenario. 3.1 Compile for recording: The source code is compiled according to Fig. 10. scenarios that invoke many features. as the ability of the system to start or terminate. Yet, tation time hardly gives hints for feature-specific compu- and modules lie in between subsystem and global declara- σ(O) = {a ∈ A | (o, a) ∈ I for all o ∈ O} (1) profiles and scenarios can be reused. problem domain. represent the concepts and whose edges denote the • The user is the person who uses the system according with profiling options or is instrumented to obtain the ex- dency graph in order to narrow the computational units to tion level. Basic blocks as computational units are only • Scenarios to which two computational units u1 and Furthermore, we show how our method allows incremen- The remainder of this article is organized as follows. only few features may actually be of interest to the ana- tational units. Once the specific computational units have ({o1 }, {a1 , a4 , a6 , a7 }) ({o3 }, {a3 , a5 , a6 , a7 }) ({o1 }, {a1 }) ({o3 }, {a3 }) to the selected scenarios. ecution profile. Example. The analyst of FIG may first be interested in tal exploration of features while preserving the “mental those that form self-contained and understandable feature- adequate for smaller systems or parts of a system where Analogously, the set of common objects τ (A) for a set of superconcept-subconcept relation ≤ as shown in Fig. 5(a). u2 jointly contribute can be identified by the supremum Sect. II gives an overview of our technique and introduces lyst for her task at hand. In the following, we assume that been identified through our technique, other techniques, All activities except the static dependency graph extrac- 3.2 Scenario execution: The system is executed by a the two different ways to draw a circle. She would therefore map” the analyst has gained through the analysis. specific computational units. Computational units that more detail is needed due to the likely information over- attributes A ⊆ A is defined as: The most general concept is called the top element and γ(u1 ) γ(u2 ). In the lattice, the supremum is the closest the basic concepts. Sect. III introduces concept analysis. only a subset of features is relevant. Consequently, only such as static or dynamic slicing [8], [9], can be used to tion (which is done only once) benefit from the knowledge user according to the scenarios and execution profiles are select the two scenarios Draw-circle-diameter and Draw- Keywords— program comprehension, formal concept anal- are only very basic computational units used as building load to the analyst. is denoted by . The most special concept is called the ({o2 }, {a2 , a4 , a5 , a7 }) ({o2 }, {a2 }) common node toward the top element starting at the nodes Sect. IV describes the process for locating and analyzing the computational units required for these features are of obtain the order of execution if required. These techniques that is gained in previous iterations and can be applied re- recorded. circle-radius. When she understands the differences be- ysis, feature location, program analysis, software architec- blocks for other computational units but not containing any For practical reasons, for this paper we decided to use τ (A) = {o ∈ O | (o, a) ∈ I for all a ∈ A} (2) bottom element and is denoted by ⊥. to which u1 and u2 are attached. All scenarios at and above ture recovery features in more detail. In Sect. V, we report on two case interest, too. The feature-unit map—as one result of can then be applied more goal-oriented by focusing on the (∅, {a1 , a2 , a3 , a4 , a5 , a6 , a7 }) (∅, ∅) peatedly until sufficient knowledge about the system has If suitable tool support is available, a scenario’s execu- tween these two features, she would investigate other circle application-specific logic are sorted out. Additional static routines as the computational unit of choice, where a rou- The concept lattice can be visualized in a more readable this common node are those jointly implemented by u1 and studies conducted to validate our approach. The related our technique— describes which computational units im- most feature-specific computational units yielded by our been gained. The order of the activities is specified by tion may be recorded at wish to exclude parts of the execu- operations and additionally select Move-circle and Color- analyses, like strongly connected component identification, tine is a function, procedure, subprogram, or method ac- A formal context can be represented by a relation table, equivalent way by marking only the graph node with an (a)Full concept lattice. (b)Sparse representation. u2 . For instance, setDiameter and color jointly contribute research in the area is summarized in Sect. VI. plement a given set of relevant features. technique. the IDEF0 diagram in Fig. 6: An activity may start once tion that are not relevant, such as start-up and shutdown of circle. 2 I. Introduction dominance analysis, and program slicing [8] support the cording to the programming language. For the case studies where the columns hold the objects and the rows hold the attribute a ∈ A whose represented concept is the most gen- to Color-circle according to Fig. 10. Scenario. Features are abstract descriptions of a system’s Feature-unit map. Our technique derives the feature-unit search for the units of interest. attributes. An object oi and attribute aj are in the rela- eral concept that has a in its intent. Analogously, a node Fig. 5. The concept lattices for the example context in Fig. 4. its input is available. The activities are explained in the the system [24], [25], [26]. Certain debuggers, for instance, • Computational units jointly required for two scenarios s1 presented later on in this paper, routines were appropriate. D.2 Concept Analysis U NDERSTANDING how a certain feature is imple- mented is a major problem of program understand- ing. Before real understanding starts, one has to locate II. Overview The goal of our technique is to identify the computa- expected behavior. If a user wants to invoke a feature of a system, he needs to provide the system with adequate input map through concept analysis, a mathematically sound technique. In our application of concept analysis, concept For large and complex systems, our approach can be ap- plied incrementally as described in this paper. Static and dynamic dependencies. The results from concept tion I iff the cell at column i and row j is marked by ”×”. As an example, a binary relation between arbitrary objects will be marked with an object o ∈ O iff it represents the most special concept that has o in its extent. The unique following sections. A. Static Dependency Graph Extraction allow to start and end trace recording. Instrumenting the source code so that only relevant parts are recorded is gen- This process embodies a completely automated step that creates a concept lattice from the invocation table. and s2 are described by the infimum µ(s1 ) µ(s2 ). In the lattice, the infimum is the closest common node toward the When we subm tted the camera-ready the product on peop e stepped n tional units that specifically implement a feature as well as to trigger the feature. For instance, to draw a circle, the analysis—simply stated—mutually intersects the execution analysis based on dynamic information are used to guide and attributes is shown in Fig. 4(a). For that formal con- element in the concept lattice marked with a is therefore: erally not an option because this requires that the feature- bottom element starting at the nodes to which s1 and s2 the implementation of the feature in the code. Systems the set of jointly or distinctly required computational units user of FIG needs to press a certain button on the control profiles for all scenarios and all resulting intersections to Applicability the analyst in her static analysis, that is, her inspection of text, we have: The static dependency graph should subsume all types unit map is at least partially known already. In order to derive the feature-unit map by means of con- are attached. All computational units at and below this need for additional scenarios (incremental analysis) often appear as a large number of modules each contain- for a set of features. To this end, the technique combines panel for selecting the circle drawing operation, then to obtain the specific computational units for a feature and the static dependency graph. We use dynamic information µ(a) = {c ∈ L(C) | a ∈ intent(c)} (8) of entities and dependencies present in the dynamic depen- An alternative solution is to specify a special “start-end” cept analysis, we have to define the formal context (i.e., the common node are those jointly required for s1 and s2 . For The retrieval of the feature-unit map is based on dynamic ing hundreds of lines of code. It is in general not obvious static and dynamic analyses. position the cursor on the drawing area for specifying the the jointly and distinctly required computational units for only as a guide and not as a definite answer because dy- σ({o1 }) = {a1 , a4 , a6 , a7 } dency graph: It is unnecessary to extract dynamic informa- scenario containing the actions to be filtered out. For in- objects, the attributes, and the relation) and to interpret instance, setDiameter and draw are jointly required for information where all computational units that are exe- which parts of the source code implement a given feature. This section gives an overview on our technique, de- center of the circle, to specify the diameter by moving the a set of features. namic information depends upon suitable input data and τ ({a6 , a7 }) = {o1 , o3 } The unique element marked with object o is: (initially) tion that is not used in the subsequent static analysis. Yet, stance, in order to mask out initialization and finalization the resulting concept lattice accordingly. Move-circle and Color-circle according to Fig. 10. cuted for a scenario are collected. The scenario describes relevant Typically existing documentation is outdated (if it exists at scribes the relationships among features, scenarios, and mouse, and eventually to press the left mouse button for Example. FIG allows to draw a circle either by diameter the test environment in which the scenarios are executed. features scenario scenarios the static dependency graph may provide additional types code, the domain expert may prepare a “start-end” sce- The formal context for applying concept analysis to de- • Computational units required for all scenarios can be how to invoke a feature. This section describes the as- γ(o) = {c ∈ L(C) | o ∈ extent(c)} (9) all), the system’s original architects are no longer available, computational units (summarized in Fig. 1) and explains finalizing the circle. Such sequences of user inputs that or by radius. The analyst who is interested in the differ- The static dependency graph can be extracted from pro- A tuple c = (O, A) is called a concept iff A = σ(O) creation filter, granularity of entities and dependencies and also more fine-grained in- nario in which the system is started and immediately shut rive the relationships between scenarios and computational found at the bottom element; for instance, draw is required sumptions on features, scenarios, and computational units 1 or their view is outdated due to changes made by others. what kind of dynamic information is used as input to our trigger actions of a system with observable result [6] are ences of these two circle operations and their differences to cedural, functional, as well as object-oriented programming and O = τ (A), that is, all objects in c share all attributes formation if a static extraction tool is used that exceeds down. units will be laid down as follows: for all scenarios according to Fig. 10. we make. We will call a graph representing a concept lattice using So maintenance introduces incoherent changes which cause technique. The section also introduces a simple example called scenarios. other circle operations, such as moving and coloring, will languages. Because execution profiles can be recorded for in c. For a concept c = (O, A), O is called the extent of execution the capabilities of the available dynamic extraction tool. Since each scenario is a precise description of the se- • Computational units will be considered objects. • Scenarios that require all computational units can be domain the system’s overall structure to degrade [1]. Understand- Features. Our technique is primarily suited for functional this marking strategy a sparse representation of the lat- expert c, denoted by extent(c), and A is called the intent of c, interpretation that we will use throughout the description of the method Our technique requires a set of scenarios that invoke the set up the scenarios listed in Fig. 2. Figure 3 lists the these languages, too, our technique is applicable to all these dynamic profiles of concept In this case, the static analysis can leverage less dynamic quence of user inputs that trigger actions of the system, • Scenarios will be considered attributes. found at the top element. In Fig. 10, there is no such ing the system in turn becomes harder any time a change features that may be mapped onto computational units. denoted by intent(c). Informally speaking, a concept cor- tice. The equivalent sparse representation of the lattice in analysis information but is still conservative. In our case studies, for every execution of a scenario yields the same execution • A pair (computational unit u, scenario s) is in relation I in the following sections. The example is inspired by a pre- features the analyst is interested in. A scenario s invokes computational units executed for the scenarios in Fig. 2. languages. However, the precision of the static extrac- 3 lattice 4 scenario. is made to it. In particular, non-functional features, such as robustness, responds to a maximal rectangle of filled table cells modulo Fig. 5(a) is shown in Fig. 5(b). The content of a node N instance, we extracted many detailed static dependencies profile unless the system is nondeterministic. In case of if u is executed when s is performed. vious case study [4] in which we analyzed the drawing tool a feature f if f ’s result can be observed by the user when Intersecting the execution profiles shows that setRadius is tion influences the ease of the analyst’s inspection of the source feature− Beyond these relationships between computational units One option, when trying to escape this vicious circle, reliability, or maintainability, do not easily map to compu- row and column permutations. In Fig. 4(b), all concepts in this representation can be derived as follows: code among global declarations (routines, global variables, and nondeterminism, one could either unite the profiles of all static dependencies, and static analysis is inherently more compiler the system is used as described by scenario s. A scenario specific to feature Draw-circle-radius, move to Move-circle, Figure 7 shows how to map the identifiers used in the and scenarios, further useful aspects between scenarios on analysis XFIG [5]. unit concept profiler analyst is to completely reverse engineer the system in order to tational units. for the relation in Fig. 4(a) are listed. • The objects of N are all objects at and below N . map user-defined types) but the profiler we used let us extract executions of the same scenario or differentiate each sce- They to d us that we have on y 12 pages may invoke multiple features and features may be invoked and color to Color-circle. difficult for object-oriented languages (and for functional general description of concept analysis in Sect. III to the user 2 one hand and between computational units on the other tool exhaustively identify its components and to assign fea- Computational unit. A computational unit is an exe- The technique is suited only for features that can be in- languages with higher-order functions) than for procedural The set of all concepts of a given formal context forms a • The attributes of N are all attributes at and above N . only the dynamic call relationship among routines. This nario execution. The latter is useful to identify differences identifiers used in the specific instantiation of concept anal- by multiple scenarios. For instance, a scenario for moving hand may be derived: tures to components. We integrated published automatic cutable part of a system. Examples for computational scenario name actions performed voked from outside; internal implementation features, such languages. partial order via the superconcept-subconcept ordering ≤: For instance, the node in Fig. 5(b) marked with o1 and a1 way, we had to analyze static variable accesses that might due to nondeterminism. ysis within our method. a circle requires to draw the circle first, so this scenario static • If γ(u1 ) < γ(u2 ) holds for two computational units u1 techniques for component retrieval in an incremental semi- units are instructions (like accesses to global variables), as the use of a garbage collector, may not necessarily be is the concept c4 = ({o1 }, {a1 , a4 , a6 , a7 }). have never been executed in any of our scenarios. also invokes feature “circle drawing”. There may be even Draw-circle-diameter draw a circle by diameter Static analyses need to make conservative assumptions static dependency dependency graph dependency The system is used according to the set of scenarios, one and u2 , then computational unit u2 is more specific with automatic process, in which the results of selected auto- basic blocks, routines, classes, compilation units, compo- deterministically and easily triggered from outside. (O1 , A1 ) ≤ (O2 , A2 ) ⇔ O1 ⊆ O2 (3) For practical reasons, it is sometimes useful to apply only graph extraction 5 D. Interpretation of Concept Lattice different scenarios all invoking the same set of features. Draw-circle-radius draw a circle by radius in the presence of pointers and dynamic binding, which 2 analysis statically at a time, and the execution profiles are recorded. Each respect to the given scenarios than computational unit u1 matic techniques are validated by the user [2]. nents, modules, or subsystems. The exact specification of one of (8) or (9). For example if we have a large number of B. Scenario Creation Each scenario, then, represents an alternative way of in- Move-circle draw a circle by diameter Scenarios. Scenarios are designed (or selected from existing weaken the precision of the dependency graph. Fortu- or, dually, with validated In this process step, a concept lattice for the relation system run yields all executed computational units for a because u1 contributes not just to the features for which u2 However, exhaustive methods are not cost-effective. For- a computational unit is a generic parameter of our method. attributes but just a small number of objects, we eliminate dependency feature− voking the features. For instance, FIG allows a user to and move it test cases) to invoke a known set of relevant features; that nately, research in pointer analysis has made considerable unit map A domain expert is needed for creating the scenarios. table created by process step 3 is built. The goals of inter- single scenario; that is, one column of the relation table contributes, but also to other features. For instance, color the redundant appearance of attributes and keep the full extractor analyst tunately, knowledge of components implementing a spe- Feature. A feature is a realized functional requirement of push a button or to use a keyboard shortcut to begin a cir- Color-circle draw a circle by diameter is, we assume that the analyst knows in advance which progress. There is a large body of work on pointer analy- (O1 , A1 ) ≤ (O2 , A2 ) ⇔ A1 ⊇ A2 (4) Any available information on the system’s behavior (e.g., preting the resulting concept lattices are: can be filled per system run. Applying all scenarios that is more specific to Color-circle than setDiameter and set- graph list of objects in the concepts. human involvement cific set of features suffices in many cases. Consequently, a system (the term feature is intentionally defined weakly cle drawing operation. A set of scenarios each representing and color it features are invoked by a scenario. sis for procedural languages [10], [11], [12], [13], [14], [15], (not part of IDEF0 notation) documentation, existing test cases, domain models, etc.) is 1. Identification of the relationships between scenarios and have been selected during the process of scenario selection Diameter is more specific than draw according to Fig. 10. because its exact meaning depends on the specific context). options and choices for the same feature resembles a use Fig. 2. Example scenarios for FIG. Because suitable scenarios are essential to our technique, [16], [17] and object-oriented languages [18], [19] that re- Note that (3) and (4) imply each other by definition. If IV. Analysis Process useful as input to him. Existing test cases may be useful computational units (process steps 4.1–4.3) provides the relation table for formal concept analysis. • If µ(s1 ) < µ(s2 ) holds for two scenarios s1 and s2 , then T. Eisenbarth, R. Koschke, and D. Simon are with the In- Generally, the term feature also subsumes non-functional case. a domain expert is needed to set up scenarios. In many solves general pointers, function pointers, and dynamic we have c1 ≤ c2 , then c1 is called a subconcept of c2 and Fig. 6. Process for feature location in IDEF0 notation. but not necessarily directly applicable, because the focus 2. Identification of the relationships between scenarios and Example. Figure 10 shows the concept lattice for the scenario s2 is based on scenario s1 because if s2 is executed, stitute of Computer Science at the University of Stuttgart, Our process to locate features is depicted in Fig. 6 using Breitwiesenstrae 20–22, D-70565 Stuttgart, Germany. E-mail: requirements. In the context of this paper, only functional Scenarios are used in our technique to gather the com- cases, the domain expert can reuse existing test cases as binding. These techniques vary in precision and costs. c2 is called superconcept of c1 . For instance, in Fig. 4(b) during testing is to cover the code completely and to com- features and thus between features and computational invocation table in Fig. 3, where all scenarios have been all computational units in the extent of µ(s1 ) need also to {eisenbarth,simon,koschke}@informatik.uni-stuttgart.de. features are relevant; that is, we consider a feature an ob- putational units for the relevant features through dynamic Beyond simply identifying the computational units scenarios to locate features. However, the purpose of test Interestingly enough, Milanova and others have recently we have c4 ≤ c2 . the IDEF0 notation [23]. It consists of five major activities: bine features in many ways. Scenarios in our sense are very units (process step 4.4) selected. 2 be executed. For instance, Move-circle and Color-circle 8 f1 f2 f3 u1 u2 u3 u4 u5 u6 u7 9 10 11 12 13 Meanwh e the ed tor- n-ch ef was rep aced and the ru es had changed scenarios cause u7 is also used in scenarios not invoking f1 at all. and color-circle even if Draw-circle-diameter is not consid- However, this gives us only a set of computational units, E.2 Inspection of the Static Dependency Graph ful while navigating on the dependency graph: ment, the individual switch branches would be more clearly mapped onto concepts in the superconcept along with pos- The paper cons sted of ma n y two parts the theoret ca paper descr b ng s1 × × × × × × Cspc: u1 and u2 are executed only in scenarios invoking ered. However, Draw-circle-diameter is useful to separate but it is not clear which of these computational units are Next, we inspect the executable static dependency graph • Strongly connected component analysis is used to iden- assigned to the respective feature in the concept lattice. sible user annotations. Additionally, an incremental auto- s2 × × × × × × s3 × × × × × × f1 . They are less specific than u4 because they are not used draw from setDiameter. 2 truly feature-specific and which of them are rather general- (as one specific subset of the static dependency graph) that tify cycles in the dependency graph: If there is one compu- In this section, we describe an incremental consideration matic graph layout can be chosen: Only additional nodes source execution (a)Invocation relation I. in all scenarios that invoke f1 ; that is, these computational As a matter of fact, there could be several concepts for purpose computational units used as building blocks for contains all transitive control-flow successors and predeces- tational unit in a cycle that contains feature-specific code, of attributes, namely, scenarios. Incremental consideration and e code compile for scenario profiles units are only conditionally specific. Whether u1 and u2 are which condition (10) holds when different computational other computational units. Given a feature f of interest, sors of computational units in Sstart (f ). We concentrate on all computational units of the cycle are related to the fea- of objects—that is, refinement of computational units—is recording execution ({u1 , u2 , u3 , u4 , u5 , u6 , u7 }, ∅) (∅, ∅) more or less specific than u7 is not decidable based on the units are executed for the given feature, depending on the this question can be answered as follows: ture because of the cyclic dependency. analogous. 3.1 executable 3.2 computational units here because they are the active con- ({u2 }, {s2 }) concept lattice. On one hand, they are used in all scenarios scenario contexts in which the feature is embedded. For • As a first approximation, all computational units in the • Dominance analysis is used to identify computational As soon as one understands the basics of a system, one ({u2 , u4 , u5 , u7 }, {s2 }) stituents and because they were subject to the dynamic invoking f1 and other scenarios, whereas u7 is also executed instance, let us assume we are analyzing FIG’s undo ca- extents of all feature-specific concepts for f jointly con- analysis. The executable static dependency graph can be units that are local to other computational units. A com- adds new scenarios for further detailed investigation and compiler Cspc Irlvt profiler in scenarios that do not require f1 . On the other hand, u7 pabilities. Three scenarios can be provided to explore this tribute to f . annotated with the features and scenarios for which the putational unit u1 dominates another computational unit exploration of the unknown portions of the system. If one user ({u1 , u4 , u6 , u7 }, {s1 }) ({u3 , u5 , u6 , u7 }, {s3 }) ({u1 }, {s1 }) ({u3 }, {s3 }) is executed whenever f1 is required, whereas u1 and u2 are feature: u2 if every path in the dependency graph from its root to tries to capture all features of a software at once, the re- • The analyst refines this approximation by adding and re- computational units were executed. If a computational not executed in some scenarios that do require f1 . • Draw a circle: {draw-circle} moving computational units: By inspecting the static de- unit is not annotated with any scenario, the computational u2 contains u1 . In other words, u2 can be reached only sulting lattice may become too large, too detailed, and thus Fig. 8. The process for the dynamic analysis in Fig. 6. Spec Shrd Shrd: u5 and u6 are executed in scenarios invoking f1 but • Undo circle drawing: {draw-circle, undo} pendency graph and the source code of the computational unit was not executed. Non-executable parts of the system, by way of u1 . If a computational unit u is found to be unmanageable. If one starts with a smaller set of scenarios ({u4 , u7 }, {s1 , s2 }) ({u5 , u7 }, {s2 , s3 }) ({u4 }, {s1 , s2 }) ({u5 }, {s2 , s3 }) they are also executed in scenarios not invoking f1 ; that is, • Undo without preceding drawing operation: {undo} units, she sorts out irrelevant computational units; she may namely, declarative parts, may be added once all relevant feature-specific, then all its dominators are also relevant and further increases this set, all accumulated knowledge basic they are shared with other features. These computational For the overlapping scenarios {draw-circle, undo} and also add feature-relevant computational units that were not computational units have been identified. A static points- to the feature, because they need to be executed in order an analyst gained while working with the smaller lattice the method and the eva uat ons w th case stud es Rlvt ({u6 }, {s1 , s3 }}) units are presumably less relevant than u1 and u2 , which {undo}, we may assume that different computational units for u to be executed. If none of a dominator’s dominatees has to be preserved. The lattice—the mental map for the interpretation ({u6 , u7 }, {s1 , s3 }}) executed due to an incomplete input coverage of the sce- to analysis is needed to resolve dynamic binding and calls incremental 4.3 are executed only when f1 is invoked, and also less relevant will be executed beyond those that are specific to com- narios. The concept lattice is an important guidance for contains feature-specific code and the dominator itself is analyst’s understanding—changes when new scenarios are analysis concept ({u7 }, {s1 , s2 , s3 }) ({u7 }, {s1 , s2 , s3 }) via routine pointers if present. The static points-to anal- than u7 , which is executed in all scenarios invoking f1 . mand draw-circle: Quite likely, additional computational the analyst’s inspection of the dependency graph. not feature-specific, then the dominator is a clear cutting added. Fortunately, the smaller lattice can be mapped to analyst lattice ysis may take advantage of the knowledge about actually (b)Concept lattice for context in Fig. 11(a) (c)Sparse concept lattice of Fig. 11(b) categorized with respect Irlvt: u3 is irrelevant to f1 because u3 is executed only units will be executed to handle the erroneous attempt to executed computational units yielded by the dynamic anal- point as all its dominatees are local to it. Consequently, the larger one (the smaller lattice is the result of a so-called Example. For FIG’s ability to color a circle, the ana- execution invocation to feature f1 that has been exposed in scenarios s1 and s2 . in scenarios not containing f1 . call undo without previous operation. Consequently, the ysis. the dominator and all its dominatees can be omitted while subcontext). senario lyst will need to validate the set of computational units profiles scenario table concept feature These facts are more obvious in the sparse representation lattice will contain an own concept for {draw-circle, undo} We primarily consider only those computational units ui understanding the system. Definition. Let C = (O, A, I) a context, O ⊆ O, and selection analysis Fig. 11. Categorizing concept lattices. {color, setDiameter, draw} according to the concept lat- 4.1 4.2 mapping 4.4 feature− of the lattice. Using this representation, given a feature and another one for {undo}, where the latter is not a sub- for which ui ∈ extent(cf ) holds because only those com- If more than one feature is relevant, one simply unites A ⊆ A. Then C = (O , A , I ∩ (O × A )) is called a sub- tice in Fig. 10. The lattice shows that the analyst should unit map f , one identifies the concept, cf , for which the following concept of the former. The infimum of these two scenarios putational units are actually executed when f is invoked the starting sets for each feature and then follows the same context of C and C is called a supercontext of C . 2 start with inspecting color because this appears as the most analyst is still a simple way to identify computational units rele- feature, the computational units particularly relevant to f1 will contain the computational units of the undo opera- approach. For more than one feature, the concept lattice analysis concept condition holds: according to the dynamic analysis. Hence, we combine analyst specific computational unit for coloring a circle. 2 vant to the actual features in the concept lattice, although are u4 and u7 . tion executed for normal as well as exceptional execution, identifies computational units jointly and distinctly used In our application of concept analysis, we add only new tool static and dynamic information to eliminate conditional an unambiguous identification may require additional dis- We notice that u7 is also used in all other scenarios, so whereas the concept representing {undo} contains the com- static computational units executions in order to reduce by those features. rows (one for each new scenario, assuming that scenarios cf = (U, S) and sj = {f } (10) E.1 Building the Starting Set Fig. 9. The process for interpretation of the concept lattice in Fig. 6. criminating scenarios. The basic idea is to isolate features that one cannot consider u7 a specific computational unit putational units for error handling. the search space. Nevertheless, one should check for the Once all relevant computational units have been identi- occur in rows of the relation table) but never new columns sj ∈S in the concept lattice through combinations of overlapping for any of f1 , f2 , or f3 . Computational unit u4 , in con- In case of multiple concepts for which condition (10) All computational units in the extent of a concept jointly reasons why certain computational units have not been ex- fied, other static (e.g., program slicing) as well as dynamic to the relation table (because we statically know all com- We had to cut the paper and pub shed on y the theoret ca part scenarios. trast, is used only in scenarios executing f1 . We therefore Concept cf is called a feature-specific concept for f . holds, we can unite the computational units that are in contribute to all features in the intent of the concept, which ecuted. analyses (e.g., trace recording to obtain the order of execu- putational units in advance). Adding new rows leads to a If a scenario invokes several features, one can formally state the hypothesis that u4 is specific to f1 whereas u7 Based on the feature-specific concept, one can categorize Spec with respect to these concepts. If the identified con- immediately follows from the definition of a concept. How- Any kind of traversal of the executable static dependency tion) can be applied to obtain further information. These new formal context (U, S , I ) in which relation I extends model a scenario as a set of features s = {f1 , f2 , . . . , fm }, is not. Because there is no other scenario containing f1 the computational units as follows: cepts are in a subconcept relation to each other, the su- ever, there may also be computational units in the extent graph is possible, but a depth-first search along the control- analyses can be performed more goal-oriented by leveraging relation I. Move−circle move Color−circle color Draw−circle−radius setRadius where fn ∈ F for 1 ≤ n ≤ m (F is the set of all relevant other than s1 and s2 , computational unit u4 is the only Spec: all computational units u for which γ(u) = c holds. perconcept represents a strict extension of the behavior of that contribute to other features as well, so that they are flow is most suited because a computational unit can be the retrieved feature-unit map. features). This modeling is simplifying because it abstracts computational unit specific to f1 . Rlvt: all computational units u for which γ(u) = c and the feature. If the concepts are incomparable, these con- not specific to the given feature. There may be computa- understood only if all its executed computational units are Proposition. Let C = (O, A, I) and C = (O, A , I ), from the exact order and frequency of feature invocations Note that this is just a hypothesis because other features c < c holds. cepts represent varying context-dependent behavior of the tional units in the extent that do not contain any feature- F. Incremental Analysis where A ⊆ A and I = (I ∩ (O × A )). Then every extent understood. In a breadth-first search, a human would have Draw−circle−diameter setDiameter in a scenario. On the other hand, if the order or frequency might be involved to which u4 is truly specific and that are Cspc: all computational units u for which γ(u) = c and feature. specific code at all. Thus, computational units in the ex- of C is an extent of C. 2 to cope with continuous context switches. The goal of the There are at least two reasons why an incremental con- of feature invocations do count, the scenarios may indeed not explicitly listed in the scenarios. Another explanation c < c holds. If there is no concept for which condition (10) holds, tent of the concept need to be inspected manually. Because inspection is to sort out computational units that do not sideration of scenarios is desirable. First, one might not Proof. See [22]. 2 be considered complex features in their own right. If these could be that, by accident, u4 is executed both for f2 (in Shrd: all computational units u for which u is in the in- one needs additional scenarios that factor out feature f . there are no reliable criteria known that automatically dis- belong to the feature in a narrow sense because they do get the suite of scenarios sufficiently discriminating the first draw scenarios yield different execution profiles, they will appear s2 ) and f3 (in s1 ); then, it appears in both scenarios but tent of concept c where c < c holds and c and γ(u) are For instance, in order to isolate feature f1 in scenario tinguish feature-specific code from general-purpose code, According to this proposition, each extent within the not contain feature-specific code. time. New scenarios become necessary to further differenti- in different concepts in the lattice and their commonalities nevertheless is not specific to f1 . However, chances are high incomparable. s1 = {f1 , f3 }, one can simply add a new scenario s2 = this analysis cannot be automated and human expertise is The executable static dependency graph rather than the ate scenarios into features. Second, new scenarios are useful subcontext will show up in the supercontext. This can Fig. 10. Sparse concept lattice for Fig. 3. and differences are revealed and may be analyzed. that u4 is specific to f1 because u4 is not executed when f2 Irlvt: all other computational units not categorized by {f1 , f2 }. The computational units specific to f1 will be in necessary. However, the concept lattice may narrow the concept lattice is traversed for inspection because the lat- when trying to understand an unfamiliar system incremen- be made plausible with the relation table: Added rows With the domain expert’s additional knowledge of which and f3 are jointly invoked in s3 , which suggests that u4 at other categories. µ(s1 ) µ(s2 ). candidates for manual inspection. tice does not really reflect the control-flow dependencies: tally. One starts with a small set of relevant scenarios to will never change existing rows, so the maximal rectan- Today the day has come to present the case study we conducted for features are invoked by a scenario we can identify the com- least comes into play only when f1 interacts with f2 or f3 . When the distance between c and c is considered, there It is not necessary to consider all possible feature com- The concept lattice and the dependency graph can help γ(u1 ) > γ(u2 ) does not imply that u1 is a control-flow pre- locate and understand a fundamental set of features by gles forming concepts will extend only in vertical direction are based on Draw-circle-diameter according to Fig. 10. D.4 Scenario Feature Mapping (if scenarios are listed in rows). putational units relevant to a certain feature. Let us con- At any rate, the categorization is hypothetic and needs to are additional nuances within categories Rlvt, Cspc, binations in order to isolate features in the lattice. Inter- to decide in which order the computational units are to be decessor of u2 . However, the concept lattice may still pro- providing a small and manageable overview lattice. Then, The interpretation of the concept lattice as described sider the invocation relation I in Fig. 11(a) (for better leg- be validated by the analyst. and Shrd possible. The distance measures the size of the secting all currently available scenarios exactly tells which inspected such that the effort for manual inspection can be vide useful information for the inspection. In Section IV-D, one successively increments the set of considered scenarios This proposition on the invariability of extents of sub- above gives insights into the relationship between scenarios ibility, scenarios are listed as rows and computational units Computational units that are somehow related to but set of features a computational unit is potentially relevant features are not yet isolated (the intersection could be done reduced to a minimum. Since we are interested in com- we made the observation that the lower a concept γ(u) is to widen the understanding. contexts that differ only in the set of objects results in S and computational units U . However, the analyst is as listed as columns). The table contains the called compu- not specific for f1 are such computational units that are for. The larger the set, the less specific the computational by concept analysis applied to the formal context consist- putational units most specific to a feature f , we start at in the lattice, the more general computational unit u is be- Adding scenarios means adding attributes to the formal a simple mapping of concepts from the subcontext to the Thus the lattice also reflects the level of application primarily interested in the relationship between features F tational units u1 , . . . , u7 per scenario, and furthermore the executed for scenarios invoking f1 amongst other features. unit is. ing of scenarios and features, where the incidence rela- those computational units ui that are attached to a feature- cause it serves more features—and vice versa. Thus, the context; but there are also situations in which objects are supercontext (for a formal proof see [22]): specificity of computational units. The information de- and computational units U . This section describes how to invoked features per scenario: s1 = {f1 , f3 }, s2 = {f1 , f2 }, In our example, both s1 and s2 invoke f1 . Computational Example. The scenario Move-circle in Fig. 2 invokes tion describes which feature is invoked by which scenario). specific concept of f , that is, for which cf = γ(ui ) holds, concept lattice gives us insight into the level of abstrac- added incrementally: in cases where computational units scribed above can be derived by a tool and fed back to identify this relationship in the concept lattice if there is no and s3 = {f2 , f3 }. The corresponding concept lattice for units in extents of concepts which contain s1 or s2 are there- two features: the ability of FIG to draw a circle by di- Slightly modified variants of scenarios invoking the feature where cf is a feature-specific concept for f . If there are tion of a computational unit and, therefore, contributes to need to be refined. For instance, computational units with (U, S) → (U, σ(U )) the analyst. Inspecting the relationships derived from the one-to-one correspondence between scenarios and features. the invocation relation in Fig. 11(a) is shown in Fig. 11(b). fore potentially relevant to f1 . In our example, u1 , u2 , u5 , ameter and the ability to move this circle. The scenario can be added to isolate the feature specifically. no such computational units, we collect all computational the degree of confidence that a specific computational unit low cohesion—that is, computational units with multiple, The mapping is a -preserving embedding, meaning that concept lattice, a decision may be made to analyze only a Because one feature can be invoked by many scenarios The feature part of the table is ignored while constructing and u6 are potentially relevant in addition to u4 and u7 . Color-circle also uses the ability to draw a circle; yet, it The addition of new scenarios in order to discriminate units below any of the feature-specific concepts cf of f with contains feature-specific code. yet different functions—will “sink” in the concept lattice if the partial order relationship is completely preserved. Con- subset of the original features in depth due to the additional and one scenario can invoke several features, there is not this lattice. Computational unit u3 is executed only for scenario s3 , colors the circle instead of moving it. Hence, the compu- features in the lattice will lead us to an incremental con- minimal distance to cf in the sparse representation. There Example. The analyst would first validate the starting they contribute to many features. A routine containing a sequently, the supercontext is basically a refinement of the dependencies that concept analysis reveals. All computa- always a strict correspondence between features and sce- Computational units specific to feature f1 can be found which does not contain f1 . tational units responsible for drawing a circle are attached struction of the concept lattice described in Sect. IV-F. can be more than one concept cf , so we unite all computa- set for FIG’s ability to color a circle Sstart (color-circle) = very large switch statement where only one branch is actu- subcontext. By this mapping all concepts of the subcontext tional units required for these features (easily derived from narios. For instance, as discussed above, the scenarios in the intersection of the executed computational units of Altogether, we can identify five categories for computa- to the concept in Fig. 10 that represents the intersection of Before we come to that, we describe the static dependency tional units that are attached to one of these concepts. The {color}. Then she would inspect the control-flow predeces- ally executed for each feature is a typical example. If the can be found in the supercontext. TSE After 9 years fina y! the concept lattice) form a starting point for further static Move-circle and Color-circle of FIG are based on Draw- the two scenarios s1 and s2 because f1 is invoked for s1 and tional units with regard to feature f1 (see Fig. 11(c)): the features invoked by Move-circle and Color-circle. The analysis. subset of computational units identified in this step that is sors and successors of color. Some of them might not be analyst encounters such a routine during static analysis, The supercontext may include new concepts not found analyses to validate the identified computational units and circle-diameter according to Fig. 10 because in order to s2 . The intersection of the computational units executed Spec: u4 is specific to f1 because it is used in all scenarios scenario Draw-circle-diameter would not necessarily have accepted after manual inspection is called the starting set executed, yet a brief check is still necessary to make sure she could lower the level of granularity for computational in the subcontext. The consequence for the visualization to identify further computational units that were possibly move or color a shape, one has to draw it first. The sce- for s1 and s2 can be identified as the extent of the infimum invoking f1 but not in other scenarios. been required to identify the computational units for draw- E. Static Dependency Analysis Sstart (f ). that they are indeed irrelevant. Then, she would continue units specifically for this routine to basic blocks. Basic of the supercontext is that the newly introduced concepts not executed during dynamic analysis because of limita- nario for moving or coloring a shape will thus necessarily of the concepts associated with s1 and s2 : µ(s1 ) µ(s2 ) = Rlvt: u7 is relevant to f1 because u7 is used in all sce- ing a circle by diameter: The sparse lattice reveals these From the concept lattice, we can easily derive all com- Example. The starting set for FIG’s ability to color a with setDiameter and eventually inspect draw. 2 blocks as computational units disentangle the interleaved can be highlighted easily in the visualized lattice of the tions in the design of the scenarios. invoke the feature which draws a shape. Fortunately, there ({s1 , s2 }, {u4 , u7 }). Since s1 and s2 do not share any other narios invoking f1 ; but it is also more general than u4 be- computational units as the direct infimum of Move-circle putational units executed for any set of relevant features. circle, Sstart (color-circle), is {color}. 2 Two additional analyses gather further information use- code: For the example routine with the large switch state- supercontext and that concepts in the subcontext can be Somet mes you get a second chance n fe
  • 16.
    The problem wewere trying to solve in the paper can be explained very simply with an example. • This is a screenshot of the drawing tool XFig. It allows you to draw graphical objects such as circles, rectangles, and text. • Suppose, you were a developer and assigned to extend XFig. For instance, your task is to add triangles. • As you know, XFig has been developed by someone else, not you. • Likely, you would first like to understand how it works for drawing the existing objects. • The very first problem to do that is to locate the code that implements these features.
  • 17.
    Here is thecall graph of XFig. Now, where would you start?
  • 18.
    This problem isknown as feature location. Another term used is concept location. Feature location answers the question “Where does this program do X?”, as Norman Wilde phrased it back in 1992. Norman Wilde is a pioneer in feature location. He received the most-influential paper award for ICSM 1992 for his work on feature location. Where does this This year’s best-paper award went to a feature location paper. too. There seems to be a tradition for most-influential papers related to program do X? feature location at ICSM. — Norman Wilde, 1994
  • 19.
    His technique worksas follows. Because the technique is based on dynamic information, you need to compile your program first. source compiler executable code
  • 20.
    Then, you runthe program invoking the relevant feature X and record every piece of code that was executed. All that code is relevant for the feature. But it may also be executed source compiler executable when other features are executed, thus it may contain code not really code specific to the feature of interest, for instance, the main function. invoke with feature trace profiler invoking input set I
  • 21.
    For this reason,the program is executed once more. This time without invoking the feature of interest. This gives you all code that is executed when the feature is not used. source compiler executable code invoke with feature trace profiler invoke w/o feature trace profiler invoking input set I excluding input set E
  • 22.
    Now we havetwo sets: the code executed for the feature of interest, and code that is executed even though the feature was not invoked. We can determine the difference between these two sets, which gives us source compiler executable the code that is more specific to the feature of interest. code invoke with feature trace profiler invoke w/o feature trace profiler invoking input set I starting set difference I−E for static excluding analysis input set E — Wilde et al. 1992
  • 23.
    Here is asimple example. Let this be the dynamic call graph of XFIG when the feature was with feature executed. set centc draw draw arc main
  • 24.
    Then we executethe program once more, this time without the feature. We obtain this other red call graph, which overlaps with the other one. with feature without feature load set centc draw draw arc set cente main
  • 25.
    We compute thedifference between them and detect the routine set centc as routine that was executed only for the feature of interest. with feature without feature Problems of dynamic analysis • Results depend upon input and are, thus, incomplete • Set difference is binary: an element is either in the set or not load – some of the code in the excluding input set may still be somewhat relevant to the feature set centc draw draw arc set cente main
  • 26.
    An alternative approachwas proposed by Vaclav Rajlich, another pioneer in concept location.
  • 27.
    Vaclav proposed astatic technique. Here, the idea is to extract a static dependency graph. The user browses the call graph and a tool supports call call the navigation, similar to a web browser. graph call graph graph extractor traversal Problems: • Where to start? load set text • Where to continue? set centc • When to stop? draw draw arc • Static analysis is difficult. set cente main set ru ll move save
  • 28.
    Our technique combinesthese two ideas and generalizes from one feature of interest to multiple features. First, we run a dynamic analysis similar to Norman Wilde’s idea. invoke feature f1 trace profiler routines (f1) invoke feature f2 trace profiler routines (f2) compiler executable source code
  • 29.
    We are interestedin many features and not only one. We want to understand what the difference is between drawing circles, rectangles, and text, for instance. For this reason, we execute the program more than once. At least once for each feature of interest. invoke feature f1 trace profiler routines (f1) This gives us an invocation table. Each column in that table contains the invocation code that was executed. invoke feature f2 trace profiler routines (f2) table ... ... ... ... Since we have many such columns, a simple set difference does no longer invoke feature fn trace profiler routines (fn) suffice. Instead, we use formal concept analysis. concept I will describe formal concept analysis shortly. compiler executable analysis source concept code lattice
  • 30.
    The information weobtain from formal concept analysis is then used to help navigating the static call graph. It tells us where to start, where to continue and where to stop. I will describe all these step with an example. invoke feature f1 trace profiler routines (f1) invocation invoke feature f2 trace profiler routines (f2) table ... ... ... ... invoke feature fn trace profiler routines (fn) concept compiler executable analysis call call source concept graph call graph graph code lattice extractor traversal
  • 31.
    In our exampleof XFig, we are interested in its capabilities of drawing different graphical objects. For each such object, we prepare one usage scenario or test case. Each tries to execute only one feature of interest and as few other features as possible. For instance, we prepare four test cases or usage scenarios: Scenarios • draw an ellipsis draw Ellipsis • draw a circle draw Circle • draw a rectangle draw Rectangle draw Text • draw a text
  • 32.
    Here is theresult of the dynamic analysis: the invocation table. Each column describes the set of routines executed for the respective Invocation Table feature. In Norman Wilde’s approach, we would have two columns. Here we have many. Consequently, a simple binary set difference is no longer possible. drawEllipsis drawCircle drawRectangle drawText Instead, we are using formal concept analysis. main × × × × draw × × × × draw arc × × set centc × set cente × set ru ll × set text ×
  • 33.
    Formal concept analysisis a mathematical technique to analyze binary relations. An invocation table is such a binary relation. Invocation Table Of course, formal concept analysis can analyze arbitrary binary relations. It is based on : • a set of objects → routines drawEllipsis drawCircle drawRectangle drawText set of objects O • a set of attributes → feature scenarios or test cases main × × × × draw × × × × • a binary relation between these objects and attributes; it describes which object possesses which attributes → invocation table draw arc × × set centc × set cente × set ru ll × set text ×
  • 34.
    Formal concept analysisis a mathematical technique to analyze binary relations. An invocation table is such a binary relation. Invocation Table Of course, formal concept analysis can analyze arbitrary binary relations. It is based on : set of attributes A • a set of objects → routines drawEllipsis drawCircle drawRectangle drawText set of objects O • a set of attributes → feature scenarios or test cases main × × × × draw × × × × • a binary relation between these objects and attributes; it describes which object possesses which attributes → invocation table draw arc × × set centc × set cente × set ru ll × set text ×
  • 35.
    Formal concept analysisis a mathematical technique to analyze binary relations. An invocation table is such a binary relation. Invocation Table = relation R ⊆ O × A Of course, formal concept analysis can analyze arbitrary binary relations. It is based on : set of attributes A • a set of objects → routines drawEllipsis drawCircle drawRectangle drawText set of objects O • a set of attributes → feature scenarios or test cases main × × × × draw × × × × • a binary relation between these objects and attributes; it describes which object possesses which attributes → invocation table draw arc × × set centc × set cente × set ru ll × set text ×
  • 36.
    Given the relation,you can define a function that yields the set of common attributes for a given set of objects. Invocation Table = relation R ⊆ O × A For instance, the common attributes for main, draw, and draw arc are drawEllipsis and drawCircle. You can spot that in the table by the completely filled rectangle. set of attributes A drawEllipsis drawCircle drawRectangle drawText set of objects O main × × × × draw × × × × draw arc × × set centc × set cente × set ru ll × set text × common attributes for O ⊆ O σ(O) := {a ∈ A | (o, a) ∈ R ∀o ∈ O}
  • 37.
    Given the relation,you can define a function that yields the set of common attributes for a given set of objects. Invocation Table = relation R ⊆ O × A For instance, the common attributes for main, draw, and draw arc are drawEllipsis and drawCircle. You can spot that in the table by the completely filled rectangle. set of attributes A drawEllipsis drawCircle drawRectangle drawText set of objects O main × × × × draw × × × × draw arc × × set centc × set cente × set ru ll × set text × common attributes for O ⊆ O σ(O) := {a ∈ A | (o, a) ∈ R ∀o ∈ O}
  • 38.
    Analogously, you candefine a function that yields all objects that have a given set of attributes. Invocation Table = relation R ⊆ O × A In this example, the common objects for drawEllipsis and drawCircle are main, draw, and draw arc. set of attributes A drawEllipsis drawCircle drawRectangle drawText set of objects O main × × × × draw × × × × draw arc × × set centc × set cente × set ru ll × set text × common objects for A ⊆ A τ (A) := {o ∈ O | (o, a) ∈ R ∀a ∈ A}
  • 39.
    Analogously, you candefine a function that yields all objects that have a given set of attributes. Invocation Table = relation R ⊆ O × A In this example, the common objects for drawEllipsis and drawCircle are main, draw, and draw arc. set of attributes A drawEllipsis drawCircle drawRectangle drawText set of objects O main × × × × draw × × × × draw arc × × set centc × set cente × set ru ll × set text × common objects for A ⊆ A τ (A) := {o ∈ O | (o, a) ∈ R ∀a ∈ A}
  • 40.
    Given these twofunctions, you can define a formal concept. It is defined as a pair of objects and attributes where all objects have all these Invocation Table = relation R ⊆ O × A attributes and vice versa. For example, main, draw, and draw arc together with drawEllipsis and drawCircle are a formal concept. set of attributes A drawEllipsis drawCircle drawRectangle drawText set of objects O main × × × × draw × × × × draw arc × × set centc × set cente × set ru ll × set text × formal concept c = (O, A) A = σ(O) ∧ O = τ (A)
  • 41.
    Given these twofunctions, you can define a formal concept. It is defined as a pair of objects and attributes where all objects have all these Invocation Table = relation R ⊆ O × A attributes and vice versa. For example, main, draw, and draw arc together with drawEllipsis and drawCircle are a formal concept. set of attributes A drawEllipsis drawCircle drawRectangle drawText set of objects O main × × × × draw × × × × draw arc × × set centc × set cente × set ru ll × set text × formal concept c = (O, A) A = σ(O) ∧ O = τ (A)
  • 42.
    Intuitively, you aresearching for maximally large filled rectangles in this table, where you can permute rows and columns. drawEllipsis drawCircle drawRectangle drawText main × × × × draw × × × × draw arc × × set centc × set cente × set ru ll × set text ×
  • 43.
    The set ofall concepts in this table is listed here. Now, let us pick two of these concepts and look closer. drawEllipsis drawCircle drawRectangle drawText main × × × × draw × × × × draw arc × × set centc × set cente × set ru ll × set text × c1 = ({main, draw},{drawEllipsis,drawCircle, drawText, drawRectangle }) c2 = ({draw arc, main, draw}, {drawEllipsis,drawCircle}) c3 = ({set cente, draw arc, main, draw}, {drawEllipsis}) c4 = ({set centc, draw arc, main, draw}, {drawCircle}) c5 = ({set text, main, draw},{drawText}) c6 = ({set ru ll, main, draw},{drawRectangle}) c7 = ({set ru ll, set text, set centc, set cente, draw arc, main, draw}, ∅)
  • 44.
    For instance, wepick these two concepts. We see that the objects of the first one are a subset of the objects of the ({draw arc, main, draw}, {drawEllipsis,drawCircle}) second one. ({set centc, draw arc, main, draw}, {drawCircle}) Likewise, the attributes of the second one are a subset of the attributes of the first one. The second one has fewer attributes. Consequently, there are more objects having these attributes. If you think of a concept as a class in an object-oriented programming language, this observation would be expressed as a superclass / subclass relation. The first concept has all attributes of the second one plus additional attributes.
  • 45.
    This allows usto define an ordering between concepts. This ordering is analogous to subclassing. ({draw arc, main, draw}, {drawEllipsis,drawCircle}) A concept c1 is smaller than concept c2 if all objects of c1 are contained ({set centc, draw arc, main, draw}, {drawCircle}) in c2 or, dually, if all attributes of c2 are in c1 . Let c1 = (O1 , A1 ) and c2 = (O2 , A2 ) be concepts; c1 ≤ c2 :⇔ O1 ⊆ O2 or dually c1 ≤ c2 :⇔ A2 ⊆ A1
  • 46.
    In that case,c2 is called a superconcept of c1 and c1 is a subconcept of c2 . ({draw arc, main, draw}, {drawEllipsis,drawCircle}) ({set centc, draw arc, main, draw}, {drawCircle}) Let c1 = (O1 , A1 ) and c2 = (O2 , A2 ) be concepts; c1 ≤ c2 :⇔ O1 ⊆ O2 or dually c1 ≤ c2 :⇔ A2 ⊆ A1 c2 is superconcept of c1 c1 is subconcept of c2
  • 47.
    This partial orderforms a lattice, called concept lattice. Lattices can be visualized with Hasse diagrams. ({draw arc, main, draw}, {drawEllipsis,drawCircle}) ({set centc, draw arc, main, draw}, {drawCircle}) Let c1 = (O1 , A1 ) and c2 = (O2 , A2 ) be concepts; c1 ≤ c2 :⇔ O1 ⊆ O2 or dually c1 ≤ c2 :⇔ A2 ⊆ A1 c2 is superconcept of c1 c1 is subconcept of c2 ⇒ lattice
  • 48.
    Here, we seethe Hasse diagram of our example. Hasse diagramm The nodes are the concepts. The edges are the partial order. Where the edge is directed from bottom to top by convention. That is, set ru ll superconcepts are at the top, subconcepts are below. draw Circle set text We see the attributes in blue. In our case, these are our features of set centc set centc draw arc set cente interest. main draw arc We see objects, which are the routines executed for these features. draw 1 main draw There are two special concepts, namely, the top and the bottom element. draw Ellipsis The top element consists of all objects and their attributes. set cente draw Rectangle The bottom element consists of all attributes and the objects that draw arc 2 3 4 5 main set ru ll possess all these attributes. draw main By the definition of the ordering of concepts, every superconcept has all draw attributes of its subconcepts. draw Ellipsis 6 draw Text Likewise, every subconcept has all objects of its superconcepts. draw Circle That is, there is a lot of redundancy in this Hasse diagram. set text draw arc main main draw draw draw Ellipsis main draw Circle 0 draw draw Text draw Rectangle
  • 49.
    The sparse Hassediagram avoids this redundancy. Sparse Hasse diagramm Each object and every attribute is listed only once, where they appear first in the lattice. By the definition of the ordering of concepts, we can infer where they draw Circle also appear in the lattice. set centc The sparse representation is much more readable. 1 draw Ellipsis set cente draw Rectangle 2 3 4 5 set ru ll 6 draw Text set text draw arc main 0 draw
  • 50.
    We can usethe sparse Hasse diagram in combination with the static call graph as follows. If we want to know what are the specific routines for a given feature, we simply look for that feature in the lattice. draw Circle set centc Let us assume, we are interested in the feature drawCircle. If the concept at which this feature occurs has no other feature, all 1 routines listed at this concept are specific to this feature. draw Ellipsis set cente draw Rectangle load set text In our example, we would start browsing the call graph at routine 2 3 4 5 set ru ll set centc set centc. draw draw arc This information could as well have been by simple set difference set cente 6 draw Text main operations. set text set ru ll But the lattice provides more information. If we look at the subconcept draw arc move of concept 3, we find a concept annotated with routine draw arc. This save routine also contributes to feature drawEllipsis because 3 is also a main draw subconcept of concept 2. 0 Thus draw arc serves two features. It is also required for feature drawEllipsis, but it is less specific than set centc. Yet, it is more specific than main and draw which are listed as transitive subconcepts of concept 3. While set difference is binary, the lattice gives a finer ranking of feature specificity. That is, you would continue your navigation of the static call graph at draw arc, and then look also at main and draw.
  • 51.
    In later work,we extended this approach to handle cases in which there is no one-to-one mapping between features and scenarios. Furthermore, we used concept analysis incrementally so that you start with a small set of features and then extend it to a larger set without loosing your previous draw Circle set centc knowledge. 1 draw Ellipsis set cente load set text draw Rectangle 2 3 4 5 set ru ll set centc draw draw arc set cente 6 main draw Text set text set ru ll draw arc move save main 0 draw
  • 52.
    Now we cometo one case study that was not published in our TSE paper. We tried this technique in an industrial case study on this machine here. This machine is chip tester. It is used by chip manufacturers to check whether a chip works correctly before it is shipped. A robot puts the chip into that machine and this machine runs various tests that can be programmed by a test engineer.
  • 53.
    The software architectureof the firmware of this chip tester is sketched applications here. firmware The firmware provides the basic operations used by various applications, that is by tools to implement, configure, run, analyze, and visualize tests. shared message Because these applications run in parallel, the first layer of the semaphor memory queue architecture provides some synchronization and means to exchange input and output. There is a programming language for writing these tests. The input to YACC parser constructor the firmware are such programs. There are first parsed and then executed. The firmware is written in C. And there is exactly one C function that command response executes an operation in this programming language. These functions are called executors. The executors use some shared utility functions to execute the operation. This architecture looks very tidy and structured. The truth is, however, data flow that 90 % of the code is hidden in this box labeled utility functions. exectuor exectuor exectuor Nobody had a clear picture of which executors shared which utility control flow functions. utility functions firmware hardware
  • 54.
    Here is thestatic call graph of the firmware. It consists of roughly 10,000 routines and is very complex.
  • 55.
    Configuration Setup We analyzed 76 different operations of this programming language. Related operations can be grouped into categories. For instance, there CNTR, CNTR?, CONF, CONF? UDEF, UDPS, UDGP are operations for the configuration setup, relay control, and many more. DPFN, DFPN?, DFPS, DFPS? DFGP, DFGP?, DFGE, DFGE? Since the operations of one category are semantically related, we would PALS, PALS?, PSTE, PSTE? PSFC, PSFC?, PQFC, PQFC? assume that they share also a lot of utility functions. PACT, PACT? That was one of your hypotheses, we investigated. Relay Control (Test Execution) RLYC, RLYC? Level Setup Commands LSUS, LSUS?, DRLV, DRLV? RCLV, RCLV?, TERM, TERM? Timing Setup Commands PCLK, PCLK?, DCDF, DCDF? WFDF, WFDF?, WAVE, WAVE? ETIM, ETIM?, BWDF, BWDF? Vector Setup Commands SQLA, SQLB, SQLB?, SQPG, SQPG? SPRM, SPRM?, SQSL, SQSL? Misc. FTST, VBMP, PSLV, CLMP WSDM, DCDT, CLKR, VECC SDSC, SREC, DMAS, STML
  • 56.
    To locate these76 features, we provided one test case for each. To factor out all C functions that are executed for all commands, we added one test case that contained only the NOP command that does nothing at all. Because some commands allow variant parameters, we added additional test cases to cover these, too. real 76 scenarios for relevant commands In order to factor out code for startup and shutdown etc., we added one 1 scenario for NOP command test case in which the firmware was started and immediately shutdown additional 2 additional parameter combinations again. factoring 1 start-end Because some of the commands had certain preconditions, we added additional test cases to fulfill these preconditions. 13 scenarios for preparing steps In total, we had 93 test cases. total 93 scenarios
  • 57.
    Here is theresulting concept lattice for our study. The height of the concepts is proportional to the number of routines contained therein; except for the bottom element. In the first layer of the lattice, you find the code of the executors and all functions that were executed only by these executors. And below that layer, you can see which utility functions are shared by which executors. Furthermore, we could confirm our hypothesis that executors of the same category have more utility functions in common. To validate our findings, we asked one developer of this firmware whether this lattice makes any sense to him. It did and he learned new things he did not knew before.
  • 58.
    In another casestudy, published in ASE, we evaluated our technique for two C compilers, namely, SDCC and cc1, the C compiler of GCC. The motivation of this study was to evaluate whether a finer grained dynamic analysis is feasible and pays off and whether the technique will scale to very large feature sets. The features of interests were different loop constructs and mathematical Study of SDCC / GCC(cc1) expressions in C. In addition to that, we looked at different compiler optimization options. Features of interest: Loops: do-while, while, for, if-goto Mathematical expressions: +, -, *, /, int literals Optimization options
  • 59.
    In the earlierstudy, we traced routines. In this compilers study we wanted to try statement level. Granularity Routine vs. Statements
  • 60.
    For instance, constructssuch as this are expected in compilers written in a procedural language. The routine handle() would be called for all loop constructs, but not all of its code is executed for each loop construct. Granularity Routine vs. Statements If we trace at the level of basic blocks, we would be able to find the code within handle() that is specific to handle the DO loop in C, for instance. void handle( ... ) { switch ( ... ) { case DO : ... case WHILE : ... case FOR : ... ... } }
  • 61.
    In terms ofconcept lattice, tracing of basic blocks is a refinement of tracing at the routine level. T1, T2 F1, F2, F3 R = {B0, B1, B2}
  • 62.
    That is, aconcept in the lattices for routines may be split into several concepts in the lattice for basic blocks. Thus, you gain more detail. The additional level of detail comes not for free, however. The dynamic analysis becomes more expensive. Even worse, the lattice becomes bigger. Lattices may grow exponentially with the number of attributes and objects in the worst case. So we were wondering whether tracing at basic block level is feasible. T1 T2 F1, F2 2 3 F1, F3 T1, T2 B0, B1 B0, B2 F1, F2, F3 T1, T2 R = {B0, B1, B2} 1 F1 B0
  • 63.
    Here are somesize numbers concerning the input for concept analysis when tracing on the level of either routines or basic blocks is used. The analysis at basic block level was slowed down by a factor in between 50 and 200. But we could in fact compute the lattice in all cases. Furthermore, by the analysis at basic block level, we could find details that could not have been found at the routine level. sdcc cc1 #routines 1,325 15,986 #routines executed 650 2,657 #basic blocks 46,699 379,086 #basic blocks executed 10,113 34,602
  • 64.
    While we had76 different features in the earlier study, we wanted to see whether the technique still scales to even larger feature sets. Scalability for Large Feature Sets If you can combine features freely, there is easily a combinatorial explosion of possible features.
  • 65.
    Therefore, we lookedat 100 different C language constructs in combinations with different compiler backends and additional command Scalability for Large Feature Sets line options. features 100 test cases for 100 C language constructs one/multiple backends no/two compiler switches
  • 66.
    If we useonly one backend of SDCC and no compiler switches, the lattice Scalability for Large Feature Sets has about 80,000 concepts. features 100 test cases for 100 C language constructs one/multiple backends no/two compiler switches sdcc 1 one backend, no compiler switches: → 80,000 concepts
  • 67.
    If we havemultiple backends of SDCC and two compiler switches, the Scalability for Large Feature Sets lattice has about 4.5 mio concepts. features 100 test cases for 100 C language constructs one/multiple backends no/two compiler switches sdcc 1 one backend, no compiler switches: → 80,000 concepts 2 multiple backends, two compiler switches: → 4.5 mio concepts
  • 68.
    If we usethe simple configuration for cc1, the input to concept analysis is a table where 1.3 mio entries are set. For this size, we were not able to Scalability for Large Feature Sets compute the lattice. So, the approach does not scale for large sets of feature combinations. The lattice should be computed only on demand for subsets of features. features 100 test cases for 100 C language constructs one/multiple backends no/two compiler switches sdcc 1 one backend, no compiler switches: → 80,000 concepts 2 multiple backends, two compiler switches: → 4.5 mio concepts cc1 one backend, no compiler switches: → invocation table has 1.3 mio entries → lattice cannot be computed
  • 69.
    Now, let’s turnto the question what have others done. If you are interested in this question, I recommend to read this upcoming paper published by our hosts. They have written a very nice survey on papers on feature location that will soon appear in the journal of Software Maintenance and Evolution. I know they are constantly renaming this journal, but I stick to this name. Feature Location in Source Code: A Taxonomy and Survey Bogdan Dit, Meghan Revelle, Malcom Gethers, Denys Poshyvanyk The College of William and Mary Journal of Software Maintenance and Evolution to appear
  • 70.
    Denys and hiscolleagues have reviewed 89 articles from 25 venues and classified them within a taxonomy. Here is a distribution of the venues of papers published in these venues. ICSM is second. The premier conference for feature location seems to be ICPC. However, the chance to get an award for a feature location paper are higher at ICSM.
  • 71.
    There have beenseveral improvements on the dynamic analysis. Some researchers, for instance, take the frequency of execution into account. The intuition is, the more often code is executed, the more relevant it should be. Also they improved the recording of traces. You start and end the recording while the program is executed, so that you observe the execution only right after you triggered the feature of interest. Dynamic Analyses In addition to that, textual approaches based on methods from information retrieval emerged. Andrian Marcus is one pioneer in this field Static Analyses and Denys Poshyvanyk has continued this work. Textual approaches
  • 72.
    The authors havesummarized several open issues in feature location. I am listing here those that I find most important. We have many competing feature location techniques, but we have no clear picture yet, when to use which. There was one experiment by Vaclav Rajlich and Norman Wilde, in which they compared their static and dynamic approaches. Open Issues But there is no comprehensive evaluation. Nor are there accepted benchmarks. Luckily, Denys and colleagues have started to create some. accepted evaluation procedures and benchmarks There is no Eclipse plugin for feature location, other than maybe prototypes. The techniques we developed are not really used in the field. tool adoption in industry We have not yet found the right ways of smooth integrations of such user studies tools in the developer’s toolkit. In order to do so, we must better understand how programmers do feature location. There are some initial observational studies. We need more of these and we also need tool evaluations with real programmers.
  • 73.
    Finally, let meconclude with a quote from a senior researcher stated in a panel at CSMR 2009 in Kaiserslautern. He said that feature location is irrelevant in industry. I have never had the change to ask him what he meant by this statement. I personally do sometimes need to locate features in my code. Regrettably, I am still using mostly grep. “Feature location is irrelevant in industry.” Senior Researcher, CSMR 2009, Kaiserslautern