Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pldi09 semantics aware trace analysis


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Pldi09 semantics aware trace analysis

  1. 1. Semantics-Aware Trace Analysis Kevin Hoffman Patrick Eugster Suresh Jagannathan Computer Science Department, Purdue University {kjhoffma,peugster,suresh}@cs.purdue.eduAbstract This paper introduces a novel semantics-aware dynamic analysis for understanding and analyzing complex programs. Our approachAs computer systems continue to become more powerful and com- achieves accuracy, flexibility and scalability through the conceptplex, so do programs. High-level abstractions introduced to deal of semantic views. Semantic views are trace abstractions whichwith complexity in large programs, while simplifying human rea- selectively aggregate collections of events with shared semanticsoning, can often obfuscate salient program properties gleaned traits found in a program execution trace, allowing for specificfrom automated source-level analysis through subtle (often non- aspects and abstractions in programs to be captured accurately.local) interactions. Consequently, understanding the effects of pro- Flexibility is achieved by allowing for new, specific views to begram changes and whether these changes violate intended protocols defined and applied. Views are linked among each other to capturebecome difficult to infer. Refactorings, and feature additions, mod- program semantics in a scalable manner, and can be leveragedifications, or removals can introduce hard-to-catch bugs that often by profilers, optimizers, and bug-finders to quickly sift through ago undetected until many revisions later. program execution, focusing attention on those parts that signify interesting deviations or properties.To address these issues, this paper presents a novel dynamic pro-gram analysis that builds a semantic view of program executions. We illustrate the usefulness of our approach by examining its utilityThese views reflect program abstractions and aspects; however, to identify regressions in large (well-tested) software applications.views are not simply projections of execution traces, but are linked We define a regression as a behavior that executed correctly un-to each other to capture semantic interactions among abstractions der some input in a prior software version that is no longer correctat different levels of granularity in a scalable manner. under the same input in a newer version. Revisions are commonWe describe our approach in the context of Java and demonstrate its in complex software systems, and refactorings and feature updatesutility to improve regression analysis. We first formalize a subset can introduce subtle regression bugs that are often not detected inof Java and a grammar for traces generated at program execution. a timely manner. Identifying and fixing these regressions can beWe then introduce several types of views used to analyze regression a complex and time-consuming task, aggravated by the presencebugs along with a novel, scalable technique for semantic differenc- of advanced program mechanisms such as dynamic dispatch, codeing of traces from different versions of the same program. Bench- generation, and loading. For example, a query of the Apache Soft-mark results on large open-source Java programs demonstrate that ware Foundation bug tracking database1 shows 455 bugs createdsemantic-aware trace differencing can identify precise and useful during 2007 that involve a regression.2 Out of these 455 bugs, 36%details about the underlying cause for a regression, even in pro- took longer than two months to resolve, 15% took longer than sixgrams that use reflection, multithreading, or dynamic code genera- months, and 11% remain unresolved as of November 2008.tion, features that typically confound other analysis techniques. Our technique provides a scalable solution to identifying precisely the cause of regressions in large, evolving Java programs. GivenCategories and Subject Descriptors D.2.5 [Software Engineer- two traces corresponding to executions of a correct and regressinging]: Testing and Debugging—debugging aids, diagnostics, testing version of a program, we employ a novel differencing algorithmtools, tracing that extracts trace abstractions from these traces, and correlates executions based on their similarity by means of a novel, tractable,General Terms Algorithms, Reliability longest-common subseqeuence (LCS)-based approximation.1. Introduction Motivating example. To illustrate the problem, Fig. 1 shows code snippets patterned after a known regression identified asThe ability to understand and analyze interactions among program MYFACES-11303 in the Apache MyFaces project,4 an open sourcecomponents to infer or verify salient program properties is critical implementation of the Java Server Faces standard. The exam-as software complexity increases. ple centers around functionality in the MyFaces framework that automatically converts non-7-bit safe characters in the output of an HTTP request response into their equivalent HTML numeric entities. In this example, this conversion only occurs if the outputPermission to make digital or hard copies of all or part of this work for personal or document type is text/html, and each character is only convertedclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citation 1 the first page. To copy otherwise, to republish, to post on servers or to redistribute 2 Bugsto lists, requires prior specific permission and/or a fee. marked as duplicate or invalid were not included. 3’09, June 15–20, 2009, Dublin, Ireland.Copyright c 2009 ACM 978-1-60558-392-1/09/06. . . $5.00 4
  2. 2. Execution Trace (and Thread View) Target Object View for LOG Object #1 !!" $%&!()**+,-./0)1*231-(((/4 !!" $%&!()**+,-./0)1*231-(((/4     ((( 5!! $%&!()**+,-.(((4 5!! $%&!()**+,-.(((4 !!" $%&!()**+,-./678 97:(((/4 !!" 6J!(,78<7:=7,8;PQ7./87>8?@8A2/4 5!! $%&!()**+,-.(((4     !!" 6;<!(7:=)2,./87>8?@8A2/4     5!! 6;<!(7:=)2,.(((4 978B89=7     !!" CD+!(17E.FGH GI4 Method View for SP.setRequestType         ,78 CD+!(KA31M@)9<)1-7 B FG         ,78 CD+!(KA)>M@)9<)1-7 B GI !!" 6;<!(7:=)2,./87>8?@8A2/4     5!! CD+!(17E.(((4 5!! 6;<!(7:=)2,.(((4 978B89=7     ,78 6J!(KL31MN1O B CD+! !!" CD+!(17E.FGH GI4     ((( 5!! CD+!(17E.(((4     !!" $%&!()**+,-./678 97:(((/4 ,78 6J!(KL31MN1O B CD+!         ((( (((     5!! $%&!()**+,-.(((4 !!" $%&!()**+,-./678 97:(((/4 5!! 6J(,78<7:=7,8;PQ7.(((4 5!! $%&!()**+,-.(((4 Figure 2. A portion of the execution trace for our example is shown on the left. The example is single threaded, so there is a single thread view which is identical to the full execution trace. There are many target object and method views. One target object (a) (b) view is shown for the first LOG object, containing only methodFigure 1. (a) an original non-regressing and (b) newer, regressing calls and field accesses on that object. One method view is shownversion of a program. for the SP.setRequestType method, showing only actions that oc- curred when SP.setRequestType was on the top of the call stack.if it is not in the range [32..127]. This range is defined program- Contributions. This paper makes the following contributions:matically and kept in mutable variables. A bug was introduced ina new version of this program that inadvertently sets the range to[1..127] instead of [32..127], causing a regression when given a 1. Semantic views: Because execution traces can be millions ofdocument of type text/html that contains characters in the range entries long, we define a new view-based trace abstraction that[1..31]. allow traces to be tractably analyzed and traversed. These views reflect the different levels of abstraction that naturally arise inIn the original version, the ServletProcessor class directly instan- object-oriented programs (e.g. objects, methods, threads, etc.),tiates the NumericEntityUtil with the correct range. This object and provides a semantics-based means of structuring trace not used until after the HTTP request has been fully processed We present a formalization of traces and views for a subsetand the output generated; the actual character conversion process is of Java extending Featherweight Java with references, assign-affected by dynamic state initialized much earlier during program ments, and threads. We furthermore define how these views canexecution. In the new version, the BinaryCharFilter class was ex- be correlated, linked, and navigated.tracted from the ServletProcessor as part of a new generic I/Ofiltering abstraction, thus confounding static analysis as no read- 2. Tractable trace differencing: A new technique for differenc-ily apparent structural property is violated. The BinaryCharFilter ing view-based traces is introduced, to correlate common pointsclass however provides an incorrect range of [1..127] to the new of execution by efficiently and accurately approximating theNumericEntityUtil object, causing the character filtering process longest-common subsequence (LCS) [3] between full programto later produce incorrect results, but only for certain inputs. traces. Notably, our differencing technique has linear complex- ity in both time and space, allowing full program traces thatThis example is a pattern for an entire class of regressions caused include dynamic state to be used in the analysis while keepingwhen a piece of code incorrectly alters some dynamic state in the running times and memory requirements reasonable.program, with the manifestation of the error appearing, only incertain cases, at some later point in the execution.5 The causal 3. Regression cause analysis: We show how to use our differ-distance between the point where the error occurs and where it encing technique to accurately identify the causes of variousmanifests makes it difficult to precisely analyze such regressions regressions. The comparisons between traces for all differencemanually, or correlate information derived from dynamic program sets are analyzed to produce a final candidate set of regressionslicing techniques that identify all semantically related operations causes. Besides identifying differences that likely caused the re-to a regression. gression, the analysis outputs a full semantic “diff” between the original and new versions, allowing these potential causes to beFig. 2 illustrates the intuition underlying our approach: different viewed in their full context, with dynamic state.semantic views are used to capture individual aspects of the pro-gram and link them together. In the figure, we show views for themain thread of execution, as well as an object view for the log These techniques are implemented in a fully automated way in ourobject, and a method view for method setRequestType. Observe tool, RP RISM, requiring no code annotations or access to sourcethat these specialized views record events that may be temporally code. Benchmark results on realistic well-engineered Java pro-far removed from one another. Our technique correlates these views grams demonstrate that our techniques yield high accuracy (fewwith views generated by the regressing version to allow the regress- false positives or negatives produced), is scalable (traces with mil-ing behavior to be localized and identified. Making such correla- lions of entries can be analyzed), and is applicable even to programstions precise and scalable is feasible because views discard actions that exploit advanced features such as multithreading, reflection,unrelated to the behavior of the abstraction being traced. and dynamic code generation. To our knowledge, RP RISM is the first tool capable of performing automated regression analysis on5 See also programs with such a feature set.
  3. 3. The remainder of the paper is organized as follows: Sections 2 and 3 Fig. 5 presents relevant definitions. Besides the usual auxiliaryformalize our approach in the context of a subset of Java and view- functions for field and method body lookup, this includes evalu-based trace differencing respectively. Section 4 elaborates on these ation contexts used in the definition of the language’s operationalconcepts in the context of regression analysis. An empirical eval- semantics, and a function E ↓ () for obtaining the representationuation of our techniques, including several large case studies, is of an object to be stored in a trace, revisited in the next section.presented Section 5. Section 6 describes related work; conclusions For now, we ignore the representation of primitive values (thus,are given in Section 7. E↓(D (d ))= ), but assume that any object representation contains the original location, retrievable via l↓(µ) for a given µ. event e ::= F E | M E | KE | T E2. View Model f ield event FE ::= get(µ, f, µ ) | set(µ, f, µ ) method event ME ::= call(µ, m, µ ) | return(µ, m, µ )In this section, we present (i) an abstract model of execution traces object event KE ::= init(A, µ, µ )and a language whose evaluation produces such traces, and (ii) the thread event TE ::= fork(S ) | end(S )views trace abstraction. object µ ::= l trace entry τ ::= entry(eid, tid, m, µ, e )2.1 Language entry ID eid ∈ Z thread ID tid ∈ ZWe first introduce a simple object-oriented language whose syntax stack S ::= S∅ | S.s(m, µ, µ )is shown in Fig. 3; its syntax and semantics follows in the spirit ofother core object-based calculi such as Featherweight Java (FJ) [11] Figure 4. Trace syntax.or Classic Java [7]. 2.3 Dynamic SemanticsOur language augments FJ — modulo casts — with locations (ofthe form l (C ) ), field assignments (t .f =t), sequences of terms (t), Fig. 6 defines an operational semantics for our language. Globalvalue objects (new D (d )), and threads (T(t ;)). A corresponding evaluation is of the form t, τ , E , S =⇒ t , τ.τ , E , S in whichprogram is defined as such a thread term. Note that for brevity, t is an ordered sequence of thread terms, τ is the trace beinglocations l (C ) are sometimes simply written l when the type C is generated, E is an object store, and S is the ordered set of stacksnot germane to the context. corresponding to these threads. Local evaluation is of the form t, τ , E , S −→ t , τ.τ , E , S with j the index of the thread term program P ::= T(t ;) j class CL ::= class C extends C {A f ; K M } t being reduced at that step. Rules for local evaluation only include creation K ::= C (A f ){super(f );this.f =f ;} one stack, S , corresponding to the single thread term reduced. method M ::= A m (A x ) {t ; return t ;} Rule C ONGR -E relates local and global evaluation. Rule F ORK - type A ::= C|D E creates a new thread of control, and records a new entry in term t ::= x | v | t .f | t .f =t | t .m (t ) | new C (t ) the trace whose event captures the stacks representing the newly | new D (d ) | T(t ;) value v ::= l (C ) | D (d ) created thread’s parentage. Note that we track the creation context primitive d ∈ D for the full ancestry of a thread (spawn-point call stack, call stack (D, d, D) ∈ {(Bool, b, B), (Int, z, Z), ..., (Float, r, R)} of spawn-point of spawning thread, etc.), in order to increase the accuracy of correlating threads between different traces. Rule E ND - Figure 3. Core language syntax. E records the completion of a thread. Rule C ONS -E defines object creation; the event associated with the object creation trace entry2.2 Traces records the class name, the representation of the parameters to the constructor, and the representation of the created object itself.Notably, program evaluation yields traces, whose structure is de- Rule C ONS -VAL -E is defined similarly for primitives. Rules F IELD -scribed below and outlined in Figure 4. Such a trace is a sequence ACC -E and F IELD -A SS -E record trace entries for field access andof trace entries τ =τ 1.....τ n , with |τ | denoting its length (n). Differ- assignment. The definition of method call (rule M ETH -E) recordsent traces are identified by their names, shown in superscripts (e.g., a trace event that captures the object and method being invoked,τ L ). A trace entry entry(eid, tid, m, µ, e ) is a five-tuple consist- along with its arguments (the calling context being captured by theing of an entry identifier eid (the index of the entry in the trace), enclosing entry). Similarly, rule R ETURN -E records a trace entryand four items forming a generic “context”, namely: an identifier that captures details relevant to a return action; this includes a traceof the active thread (tid), the method under execution (m), a rep- event that records the method and object being returned from, andresentation of the object on which m is executing (µ), and an event a representation of the return value.e that captures a specific action. For now an object is considered tobe represented simply by its location l. 2.4 Views Trace AbstractionWe make use of stacks to trace method calls. A stack S is asequence of stack entries of the form s(m,l,l’), representing the Although complete, the traces yielded by our evaluation rules re-invocation of method m found in object l’ from object l. We write quire tedious interpretation to understand the underlying programS.s(m,l,l’) for appending such an entry to a stack S . We write S behavior, since these traces capture features of higher-level con-to denote an ordered set of stacks; each element in this set captures structs. We consequently formulate named projections over thesethe dynamic context of some actively executing thread. We write low-level traces, which link semantically related trace events.S1 · ... · Sn to enumerate the elements in this set. Similar notation is These projections, termed views, represent various levels of ab-used to enumerate an ordered set of threads in a program state. As straction in the executing program. Views are formed by mappingshorthand, we write S·S to denote the ordered set of stacks derived each trace entry to a set of view names, describing which specificby concatenation of the sets defined by S and S ; similar shorthand views an entry is a member of. This set of names is obtained byis used to define concatenation over ordered sets of threads. creating a union of the results of all of the view name mapping
  4. 4. Definitions C extends C {A f ; ...} class t,τ ,E ,S =⇒ t’,τ.τ ,E ’,S’ f ields(C )=A f f ields(Object)=∅ t, τ , E , Sj −→ t , τ , E , Sj f ields(C)=A f , Af j (C ONGR -E) C... {... M } T(...) · Tj (E[t]) · T(...) , τ , E , S · Sj · S =⇒ class C extends C {... M } class A m (A x ) {t ;} ∈ M ∃...m... ∈ M T(...) · Tj (E[t ]) · T(...) , τ , E , S · Sj · S mbody(m, C)=(x , t) mbody(m, C)=mbody(m, C ) t,τ ,E ,S −→ t’,τ.τ ,E ’,S ’ j E↓(D (d ))= E↓(l (C ) )=l Sj .s(m, µ, µ ) ∈ S τ =entry(|τ |, j, m, µ , fork(S )) Evaluation contexts (F ORK -E) T(...) · Tj (E[T(t ;)]) · T(...) , τ , E, S =⇒E ::= [] | E .f | E .f =t | v .f =E | E .m (t ) | v .m (E ) | new C (E ) T(...) · Tj (E[]).T(...) · T(t ;), τ .τ , E, S · S∅ | v ,E ,t | v ;E ;t | T(E ;) | E ;return t | v ;return E S=S · Sj .s(m, µ, µ ) · S τ =entry(|τ |, j, m, µ , end(S )) Figure 5. Definitions and evaluation contexts. T(...) · Tj (v ;) · T(...) , τ , E , S =⇒ T(...) · T(...) , τ .τ , E , S · S (E ND -E)functions (ν) (one mapping function is defined for each type). Spe- l ∈ dom(E ) E ={l→[f1:v1 , ..., fn:vn ]}E S=S .s(m, µ, µ )cific views are then obtained by trace projections over view namesof the form πpνφ |γ : T → T (see Figure 7), where T is the domain τ =entry(|τ |, j, m, µ , init(C, E↓(v), E↓(l))) f ields(C)=A f new C (v ), τ , E , S −→ l (C ) , τ .τ , E , Sof traces. pνφ |γ ranges over predicates of the form T→B with T the jdomain of trace entries. pνφ |γ models whether a given trace entry (C ONS -E)is a member of a specific view (as named by γ) of a particular viewtype (φ). We focus on four view types, which are defined by their S=S .s(m, µ, µ ) τ =entry(|τ |, j, m, µ , init(D, , E↓(D (d ))))view name mapping functions: new D (d ), τ , E , S −→ D (d ), τ .τ , E , S j (C ONS -VAL -E) • Thread views, νTH : One thread view is defined for each thread tid executing in the program; it contains precisely those events E (l)=[..., fi:v, ...] S=S .s(m, µ, µ ) that occur within that thread, in the order of execution. τ =entry(|τ |, j, m, µ , get(E↓(l), fi , E↓(v))) (F IELD -ACC -E) • Method views, νCM : One method view is defined for each l .fi , τ , E , S −→ v, τ .τ , E , S j fully qualified method name m (for simplicity the class name is omitted in the figures) in the program. Each method view E (l)=[..., fi:v, ...] E ={l→[..., fi:v , ...]}E S=S .s(m, µ, µ ) contains precisely those events that occur while that particular τ =entry(|τ |, j, m, µ , set(E↓(l), fi , E↓(v ))) method was at the top of the call stack. l .fi =v , τ , E , S −→ v , τ .τ , E , S j • Target object views, νTO : For each object a target object view (F IELD -A SS -E) is defined, containing precisely those events that occur when it is the target of a method call or field access. mbody(m, C)=(x, t) • Active object views, νAO : For each object an active object view S=S .s(m , µ, µ ) S =S.s(m, µ , E↓(l)) is defined containing those events that occur when it is on the τ =entry(|τ |, j, m , µ , call(E↓(l), m, E↓(v))) (M ETH -E) top of the call stack. l (C ).m (v ), τ , E , S −→ {/ ,v/ }t, τ .τ , E , S l this x jThe key to effective program analysis using our view model of S=S .s(m , , ).s(m, µ, µ )tracing is that all these views are linked together. In the present τ =entry(|τ |, j, m , µ, return(µ , m, E↓(v )))context it is sufficient to view these links as being implicitly given (R ETURN -E)by retaining indices of the original trace in the projected views. v ;return v , τ , E , S −→ v , τ .τ , E , S jFor example, a trace event recording a method call o.m(...) willbe recorded in the thread view of the thread in which the call is Figure 6. Program evaluation.performed, in a method view for m, in the active object view of themethod performing the call, and in the target object view for target foundation for such an analysis consists in the comparison of traceobject o. The trace index found in the entry can be used to navigate pairs. This section develops a semantics for “evaluating” pairs offrom the entry found in one view to its position in another. In this such traces to identify semantic differences.way, a program as a whole may be modeled as a complex “web” ofinterconnected views. At any arbitrary point in any view, one can 3.1 Trace Differencing Semanticsuse these links to visit all semantically related views (as modeledby the defined view types), thereby organizing both the exploration We present a small-step semantics for “evaluating” pairs of tracesof program execution as well as the presentation of the results of (τ L , τ R ) (termed the left and right traces) by comparing them insuch an analysis in more meaningful ways. order to formalize the trace differencing process. The evaluation produces a set, Λ, containing those trace entries that are considered3. Views-Based Trace Differencing to be similar between the two traces. The set of differences is then easily computed from Λ and the original trace. We assume thatThe comparison of traces through comparison of their correspond- before evaluation a special eof trace entry is appended to eaching view trace webs allows for a powerful program analysis. The trace, and is also appended as many times as needed to the shorter
  5. 5. ( τ 1.πp (τ ) p(τ 1 )πp (τ 1.τ ) = πp (τ ) otherwisepνφ |γ (τ ) = (γ = ⊥ ∧ γ = νφ (τ ))νTH (entry(i, j, m, µ, e)) = TH, j Figure 10. The LCS of two strings. Note that moved subsequencesνCM (entry(i, j, m, µ, e)) = 8 , m CM (e.g., “XY”) are not detected. > TO, l↓(µ > ) e = call(µ , m, µ ) > TO, l↓(µ > > ) e = return(µ ,m, µ ) > τ L , τ R , Λ −→ τ L , τ R , Λ > < TO, l↓(µ ) e = get(µ , f, µ )νTO (entry(i, j, m, µ, e)) = L > TO, l↓(µ > ) e = set(µ , f, µ ) ( {τ 1 } τ 1 ∈ lcs(τ OL , τ OR ) > > TO, l↓(µ > ) e = init(A, µ , µ ) > > Λ = ⊥ otherwise {} otherwise :νAO (entry(i, j, m, µ, e)) = AO, l↓(µ) (S TEP -L EFT-LCS) τ 1.τ L , τ R , Λ −→ τ L , τ R , Λ ∪ Λ LFigure 7. Projection (πp ), the view identification predicate (Similar for (S TEP -R IGHT-LCS) rule)(pνφ |γ ), and entry to view name mappings (ν φ ) Figure 11. LCS comparison; −→ is parameterized over L τ OL , τ OR , which are defined to be the traces produced bytrace until both traces are the same length. The resulting augmented evaluation of the two programs to be compared, before evaluatedtrace syntax is presented in Fig. 8, along with an extended object under −→ or −→.representation. Note that locations by themselves are unsuitable L Vfor comparison across different program versions. We thus extendobject representations to tuples which now include, besides their We present two trace differencing semantics — one leveraginglocation in the case of non-value objects, an identifier (or hash) that the longest common subsequence (LCS) algorithm, and the otherrepresents a recursively computed value representation. leveraging our views trace abstraction. object µ ::= l, r serialization r ::= D:[d] | C:[r] 3.2 LCS-based Trace Differencing Semantics trace entry τ ::= entry(eid, tid, m, µ, e ) | eof E ↓(D (d ))= , D:[d] E (l)=[f1:v1 , ..., fn:vn ] Well-known differencing tools such as Unix diff are founded on E ↓(l (C ) )= l, C:[E ↓(v1 ), ..., E ↓(vn )] longest common subsequence (LCS) algorithms [3] in order to de- termine a minimal set of differences between two sequences (e.g.,Figure 8. Extended trace syntax and object representation used for lines of text in two files). Fig. 10 visualizes how the LCS identifiesdifferencing. the differences between two strings. Fig. 11 presents an evaluation semantics that leverages the LCS of the two traces in order to de- termine which trace entries should be placed in Λ. Note that twoOne important aspect to effective view-based differencing (or any trace entries are considered to correspond based on the equalityanalysis operating over views from multiple executions or versions) predicate, =e . The LCS evaluation relation relies on preserving theis in the formulation of effective view correlation functions (X), original traces before any evaluation takes place. The computationwhich are used to determine if a given view in one execution trace of the LCS also results in a correspondence mapping between allsemantically corresponds to a given view in a different execution common entries. This allows each contiguous run of differences intrace. Fig. 9 defines the type signature for all correlation functions, the traces to be viewed as either an insertion, deletion, or modifi-as well as some notation and helper relations for our differenc- cation. Using the LCS to understand differences between programing evaluations. One correlation function needs to be defined for traces is beneficial for correlating similar yet not exactly identicaleach view type. The correlation function accepts two trace entries events. For example, if a new version of a program adds a new pa-instead of two view names as arguments, because the correlation rameter to a function, the LCS will gravitate towards correlatingfunction may be context-sensitive (e.g., based on value representa- identical values, thereby identifying the new parameter as the onetions) as opposed to solely the view names. difference.Thread view correlation (XTH ) is determined by considering all pos-sible thread correlations (pairs of threads) and forming a correlation However, there are two major challenges in applying the LCS al-with the “closest match” based on the spawning call stack of the gorithm to execution traces. First, the algorithm for computing thethread (and the thread’s ancestors). Method correlation (XCM ) cor- LCS is not aware of program semantics and may blindly correlaterelates two methods if their full type signatures are equal. Target neighboring entries in the original trace with entries very far apartobject and active object correlation (XTO and XAO ) correlate two in the new trace (e.g., consider commonly occuring values, suchobjects if either the value representations or class-specific object as 0 or null). Second, the computational complexity of bare LCScreation sequence number (derivable from trace data) are equal. is Ω(n2 ) [3], making it intractable on long program traces. Faster solutions are known if the input “alphabet” is fixed, but are notBecause view correlations attempt to identify relationships among applicable in the present case, as the input alphabet includes valueprogram abstractions (i.e., threads, methods, objects) found across representations, which are innumerable. The standard dynamic pro-executions based only on the structure of their views, they are gramming algorithm requires O(n2 ) space in order to reconstructbest regarded as heuristics. Nonetheless, experimental results (Sec- the LCS (not just its length). The algorithm becomes impracticaltion 5) show that the implemented correlation functions are effec- when applied to longer program traces, requiring huge memory be-tive in practice for regression cause analysis. We believe other cor- sides time to compute the LCS. Existing algorithms with reducedrelation definitions may be useful for other kinds of analyses. space complexity require roughly twice the computation time [9].
  6. 6. Event equality τ 1 =e τ 2 True if the underlying primitive values (including those generated by E↓ in Fig. 8) of the events of the two entries are equalTrace index index(τ , τ ) The index of the entry in τ which has an eid that index(τ( , entry(j, k, m, µ, e )) = matches the eid of the entry τ i τ = τ 1.....τ i−1.entry(j, n, m , µ , e ).... −∞ otherwise τ ∩=e τ (Trace τ with only those elements that are also in τ according τ 1.(τ ∩=e τ ) ∃τ i ∈ τ : τ 1 =e τ iintersection to event equality (=e ) (τ 1.τ ) ∩=e τ = τ ∩=e τ otherwiseTrace window win(τ ,∆) (τ ) Trace τ with only those elements whose index is in the win(τ ,∆) (τ ) = πp∆ (τ ) range [index(τ , τ ) ± ∆] (∆ constant, τ ∈ τ ) p∆ (entry(i, j, m, µ, e )) = (i ∈ [index(τ , τ ) ± ∆])Correlation Xφ (τ 1 , τ 2 ) Returns νφ (τ 1 ), νφ (τ 2 ) if the views of type φ of the View correlation functions, as described in Section 3.1 two entries are correlated or ⊥, ⊥ otherwiseLCS lcs(τ , τ ) Longest common subsequence of τ and τ ’ with respect Refer to [3] for details to event equality (=e ) Figure 9. Helper relations for trace differencing evaluation semantics. τ L , τ R , Λ −→ τ L , τ R , Λ of similar entries. Evaluation thus alternates between the two rules V until the end of the traces is reached. τ 1 =e τ 2 The generic rule (S IMILAR -F ROM -L INKED -V IEWSφ ) defines how to τ 1.τ L , τ 2.τ R , Λ −→ τ L , τ R , Λ ∪ {τ 1 , τ 2 } construct LinkedSimilarEntries for a given view type, φ. V (S TEP -V IEW-M ATCH ) Not shown are the concrete rules, one for each view type (with φ instantiated to TH, CM, TO, or AO). To further explain the Λ = {τ | LinkedSimilarEntries(τ 1 , τ 3 , τ r )} r LinkedSimilarEntries relation, for a given pair of trace en- τ 1 =e τ 3 τ 2 =e τ 4 τ L ∩=e τ R = tries from the left and right thread views (τ 1 , τ 3 ) for whom secondary views should be explored to identify similar entries, τ 1.τ L .τ 2.τ L , τ 3.τ R .τ 4.τ R , Λ −→ a similar entry τ r is identified according to the antecedent of V τ 2.τ L , τ 4.τ R , Λ ∪ Λ (S IMILAR -F ROM -L INKED -V IEWS ). The first and second lines of the an- (S TEP -V IEW-N O M ATCH ) tecedent constrain the free variables τ 5 and τ 6 to be trace entries within a constant distance from the entries τ 1 and τ 3 respectively. index(τ , τ ) ∈ ˆindex(τ VL , τ 1 ) ± ∆ ˜ VL 5 ˆ ˜ The third line requires that entries τ 5 and τ 6 either have corre- VR 6 VR 3 index(τ , τ ) ∈ index(τ , τ ) ± ∆ sponding thread views, method views, or object views (which views γ L , γ R = Xφ (τ 5 , τ 6 ) the two entries have in correspondence are called matching views). r τ ∈ lcs(win(τ 5 ,∆) (πpν |γ L (τ OL )), Line four then constrains free variable τ r such that it must be in φ the LCS of fixed-size windows of the matching views. In this way, win(τ 6 ,∆) (πpν R (τ OR ))) similar entries in corresponding secondary views are identified us- φ |γ LinkedSimilarEntries(τ 1 , τ 3 , τ r ) ing LCS, but over fixed-size windows of the nearby entries in the (S IMILAR -F ROM -L INKED -V IEWSφ ) views as opposed to the entire view or the entirety of the raw traces.Figure 12. View-based comparison; −→ is parameterized over When differencing a raw pair of traces produced by a program, we V evaluate each pair of corresponding thread views (as determinedτ OL , τ OR ,τ VL , τ VR . We define τ VL , τ VR to be the two thread by XTH ) under −→, each producing a different set Λ. These sets areviews being compared, but before being evaluated under −→. V V unioned together to form the final similarity set, Λf . The final set of differences is then derived from Λf by set subtraction. We have3.3 Views-based Trace Differencing Semantics shown our technique to exhibit O(n) complexity in both space and time; the proof has been omitted due to space restrictions and isThe aforementioned challenges of LCS can be overcome by lever- available in a tech report [10].aging our views trace abstraction. Instead of differencing the rawpair of traces produced by the contextual semantics, we instead ap-ply a trace differencing evaluation semantics to one or more pairs 3.4 Exampleof correlated thread views. Evaluation of each pair of views (−→) Vproceeds by removing the head elements and if they are equal, plac- Fig. 13 shows how the differencing algorithm works for the mo-ing them in set Λ. For the other case where the head elements are tivating example shown in Fig. 1. As each trace evaluation ruledifferent, any secondary views linked to nearby entries in the main removes one or more entries from the head of the traces, theviews are explored to find correlations between entries in the left numbered circles show the progression of the evaluation as rulesand right traces, and then removing from the heads of the left and are applied. From the start of the trace up to and including theright main views any entries up until the next common point of call to the setRequestType method (from circle 0 to 1), thecorrelation. (S TEP -V IEW-M ATCH ) is applicable, adding these entries to set Λ. At circle 1, the (S TEP -V IEW-N O M ATCH ) rule becomes applicable. τ 1 andFig. 12 formalizes the above intuition of how one pair of thread τ 3 correspond to entry 9 in the left and right traces, respectivelyviews is evaluated. The rule (S TEP -V IEW-M ATCH ) handles the case (the fact that the entry id is the same is coincidental here).where the heads of the evaluated thread view pair have equalevents, in which case the entries are removed and added to Λ. LinkedSimilarEntries for (τ 1 ,τ 3 ) identifies those entries thatThe (S TEP -V IEW-N O M ATCH ) rule handles the other case. It adds to should be placed in set Λ due to the exploration of correspondingΛ those entries that were found to be similar by comparing cor- secondary views nearby to these entries. The two secondary viewsresponding views nearby the differing entries, as modeled by displayed in the figure represent two such corresponding views.LinkedSimilarEntries. The evaluation step also removes from Circles 2 and 3 depict the relevant fixed windows in the secondarythe heads of the traces any differing entries up until the next pair views over which the LCS is computed, and the checkmarks in
  7. 7. Key Original Thread View New Thread View 0 Lock­step … ­­> LOG­1.addMsg(Handling..)  ­­> LOG­1.addMsg(Handling..) … scanning …     ...      ... … Views … <­­ LOG­1.addMsg(..) <­­ LOG­1.addMsg(..) … Exploration  … ­­> SP­1.setRequestType(text/html) ­­> SP­1.setRequestType(text/html) … Skipped  …     ­­> STR­1.equals(text/html)     ­­> STR­1.equals(text/html)  … Mark Anchors …     <­­ STR­1.equals(..) ret=true     <­­ STR­1.equals(..) ret=true In Other Views 1     ­­> BINFLT­      ­­> NUM­, 127) 4          ­­> NUM­, 127)         set NUM­1._minCharRange = 32              set NUM­1._minCharRange = 1         set NUM­1._maxCharRange = 127 5             set NUM­1._maxCharRange = 127       <­­ NUM­, 127) 6         <­­ NUM­, 127)      set SP­1._binConv = NUM­1          set BINFLT­1._binConv = NUM­1 7      <­­      ­­> SP­1.addFilter(BIN)     ... 8     ...     ­­> LOG­1.addMsg(Set req..)     ­­> LOG­1.addMsg(Set req..) …         ...          ... …     <­­ LOG­1.addMsg(..)     <­­ LOG­1.addMsg(..) … <­­ SP.setRequestType(..)  <­­ SP.setRequestType(..) … 9 Original NUM­1 Object View New NUM­1 Object View ­­> NUM­, 127)  ­­> NUM­, 127) set NUM­1._minCharRange = 32  set NUM­1._minCharRange = 1 set NUM­1._maxCharRange = 127  set NUM­1._maxCharRange = 127 … <­­ NUM­, 127) 2 LCS  <­­ NUM­, 127) … ­­> NUM­1.process(..)  ­­> NUM­1.process(..) … ...  ... ﴾different﴿ Original SP.setRequestType Method View New SP.setRequestType Method View ­­> STR­1.equals(text/html) ­­> STR­1.equals(text/html) <­­ STR­1.equals(..) ret=true <­­ STR­1.equals(..) ret=true ­­> NUM­, 127)  ­­> BINFLT­ <­­ NUM­, 127) 3 LCS  <­­ set SP­1._binConv = NUM­1  ­­> SP­1.addFilter(BIN) ...  ... … ­­> LOG­1.addMsg(Set req..)  ­­> LOG­1.addMsg(Set req..) … <­­ LOG­1.addMsg(..)  <­­ LOG­1.addMsg(..) …Figure 13. How the views-based trace differencing semantics works for part of our example. The primary view and two secondary viewsare shown. Evaluation progression is labeled with circles 0–9. denotes entries that differ. !"""# !"""# denotes entries placed in set Λ via evaluation !"""#of the (S TEP -V IEW-M ATCH ) rule. denotes entries placed in set Λ via evaluation of the (S TEP -V IEW-N O M ATCH ) rule. For example, the thread view !"""#entry at circle 5 (set NUM-1._maxCharRange = 127) was marked with due to comparisons within the NUM-1 object view.these views depict which entries are thereby identified as corre- (thereby exploring nearby secondary views that correspond), andsponding (and thus should be in set Λ). Entries thus identified as identifying the next point of correspondence as the position at cir-corresponding due to exploration in secondary views are denoted cle 8. The (S TEP -V IEW-M ATCH ) is then applied for all remaining traceswith the anchor symbol in the figure. until the end is reached at circle 9.Note that even though these entries appear “nearby” to the currentpositions in the thread views in this example, in large traces these 4. Regression Cause Analysisentries identified as similar from secondary views could be thou-sands or hundreds of thousands of trace entries away. In this way We envision many types of dynamic analyses benefiting from ourexploration of secondary views allows for recognizing semantically views trace abstraction and our views-based trace differencing se-correlating events that could be very far apart in the thread views. mantics, including object protocol inference, property checkingThis characteristic is one reason this approach allows for greater (e.g., typestate), impact analysis, and automated debugging. Asaccuracy than LCS, as this approach remains resilient to reorder- mentioned we focus our attention in this paper on how semantics-ings of operations in the thread views whereas LCS identifies these aware trace differencing empowers regression-cause analysis.reorderings as differences.With LinkedSimilarEntries fully formed for the given point 4.1 Algorithm(τ 1 ,τ 3 ), the antecedent of (S TEP -V IEW-N O M ATCH ) identifies the nextclosest point of correspondence (circle 5) in the left and right thread To identify the causes of regressions, we employ our semantics-views (τ 2 ,τ 4 ) and removes all entries from the heads of the traces aware trace differencing to identify all the semantic differencesup to but not including this next point of correspondence (circle 4). between an original, non-regressing version and a new, regressing version of a program for one or more regressing test cases. LetEvaluation thus proceeds alternating between these two rules un- A represent this set of semantic differences, termed the suspectedtil the end is reached: at circle 5, (S TEP -V IEW-M ATCH ) is applied, differences set. The size of set A is usually too large to be effectiveplacing entry 11 (left) and entry 12 (right) into set Λ and pro- for finding the regression cause through manual inspection (havingceeding to circle 6. Here (S TEP -V IEW-N O M ATCH ) is applied again, in practice hundreds or thousands of differences). The goal of theLinkedSimilarEntries is formed again for this unique point analysis algorithm is thus to remove from set A those differences
  8. 8. that are not likely to have caused the regressing behavior. The Even though there are many differences in the new version ofalgorithm proceeds as follows: the code, only seven are relevant to the regression and its cause. Our tool correctly identifies these seven changes, with no false positives. It also correctly identifies 20 other runs of differences1. A set of differences that are expected to occur under correct as not being related to the regression. execution between the original and the new version is built (call this set B), by comparing execution traces for runs of To achieve these results we created two test cases: (a) a test case the two versions for test case(s) with correct behavior. These that reproduced the regression, and (b) a test that used a different differences are not likely to be related to the cause of the document type, so conversion of the characters was not applied in regression, because the regressing behavior was not observed both versions (and thus does not exhibit the regressing behavior). during execution of these correct test case(s). This set B is the First, we collected dynamic traces for the original and new versions expected differences set. for both test cases. Next, the semantic differencing tool was run on2. A set of differences for the new version between a correct the following pairs of traces: (i) original version vs. new version test case and the regressing test case is built (C). This set of for the regressing test case (forms the suspected differences set); differences includes the difference(s) that cause the regressing (ii) original version vs. new version for the non-regressing test case behavior. The correct test case should be similar in functionality (forms the expected differences set); (iii) a new version of the non- to the regressing test case so that the resulting set of differences regressing test case vs. a new version of the regressing test case is small and yet still includes the difference(s) causing the (this forms the regression differences set). regression (this is similar to the requirements of [21] and can be computed automatically using techniques from [22]). This set C is termed the regression differences set. 5. Evaluation3. The set of differences between the original and new version RP RISM employs aspect-oriented programming (AOP) [12] to that are highly likely to be responsible for the cause of the dynamically instrument Java programs via AspectJ’s load-time regression, termed D, is now calculated as follows: weaver. Pointcuts provide a flexible mechanism to capture the seg- D = (A − B) ∩ C ments of the execution trace that should be recorded. The imple- mentation is layered and provides the following features: trace seg- mentation, call stack tracking, event recording, view construction,The choice of set B impacts the effectiveness of this approach trace serialization, and tracing control. Tracing of long-runningbecause the cause for a regression can appear within the execution programs are accomodated through smart trace segmentation–trace for non-regressing test cases. Eliminating the differences may AspectJ pointcuts are used to specify relatively short regions ofthereby eliminate the cause, introducing false negatives. program execution to record as a single trace segment, and onceIn practice, however, we find that filtering differences as described a trace segment has finished executing, all trace data is offloadeddoes not compromise accuracy for three reasons. First, our dynamic to disk and the associated tracing memory is reclaimed. While thetraces are complete, so the cause of the regression must be in A, program is running the execution trace is collected but not ana-implying there are no false negatives at this stage. Second, elim- lyzed – the analysis is performed offline after the trace data hasinating the expected differences (B) eliminates many false posi- been serialized to disk.tives, increasing accuracy. Since these differences did not exhibit RP RISM implements the view correlation of Section 3.1, and alsowithin the program output, it is highly likely they are related to relaxes method and object view correlation (to be more tolerantcorrect program evolution instead. Third, intersecting with the re- to refactorings) using a context-sensitive correlation function thatgression differences set (C) further eliminates false positives. As correlates views if their entries are the same “distance” (number ofC only contains differences within the new program version, there trace entries) away from two points in the traces that are known toare fewer differences (only those caused by the differing inputs of be semantically correlated to each other. This relaxation improvesthe regressing and non-regressing test cases and excludes any dif- robustness in the presence of refactorings, such as if methods orferences due to program evolution), and yet is still likely to contain classes have been renamed, or methods have been split or com-the regression cause. bined. For example, if a method has been renamed, then there will be a difference at the point of the call site, but it is probable thatThe other source of false negatives is due to A∩C. If the regression there will be call sites in the versions where either the immedi-is caused by the removal of code in the new version, then there is no ately preceding or succeeding statement is semantically correlated.possibility for set C to contain the regression-inducing differences. Under the relaxation, the secondary views will be explored at theFor these cases, the analysis can be changed to: point of the call site of the renamed method. When the secondary D = (A − B) − C views are analyzed, the (mostly) unchanged code in the method whose name was refactored produces many new semantic correla-Subtracting set C for these cases will further reduce false positives tions back in the main view, thus providing tolerance to the methodwithout introducing false negatives, allowing the analysis to effec- rename refactoring.tively find regression causes due to the removal of code in newerversions. We empirically evaluate the effectiveness of our approach RP RISM approximates the value representations in our formalismin practice in Section 5. using the Java hashCode and toString (truncated to 128 chars) methods, which we found is a good approximation in practice, providing both reasonable performance and accuracy. Note that if4.2 Example Revisited an object uses the default java.lang.Object implementation of these methods, the value representation is forced to be empty asRecall that a regression is caused in our example when the class it is not meaningful across different program versions.BinaryCharFilter is instantiated with the incorrect range of[1..127] instead of [32..127]. The most interesting code fragments In this section we first quantitatively assess the effectiveness of ourrelated to this regression are depicted in Fig. 1. semantics-aware trace differencing, comparing it to an optimized
  9. 9. version of LCS with respect to both accuracy and performance. Measurements. To assess the effectiveness of our view-basedThe differing software versions for this assessment are based on trace differencing technique we define two measures: accuracy andthe iBUGS project [5]. We further evaluate our regression-cause speedup. Accuracy measures how many semantic correlations areanalysis by discussing in detail RP RISM’s accuracy and perfor- identified with RP RISM vs the number of correlations identifiedmance on four real-life regressions. All tests were executed on a with the LCS comparison. Accuracy is precisely defined using theserver with eight 1.8Ghz dual-core Opteron processors and 32GB following formula:of RAM. Note that RP RISM is currently single-threaded, with the ((totalEntries − rprismNumDiffs)/totalEntries)extra cores only in use during the VM’s parallel garbage collection. Accuracy = ((totalEntries − lcsNumDiffs)/totalEntries) 4 6 Note that because RP RISM can identify reorderings of operations it often identifies fewer semantic differences than LCS (resulting in 5 more semantic correlations), resulting in an accuracy value greater 3 than 100%. An accuracy of 100% in this section should be read as 4 meaning “RP RISM identified the same number of semantic differ- N u m b e r o f C a se s Number of Cases ences as in the LCS comparison.” Speedup is defined as the number 2 3 of trace entry compare operations performed during the LCS com- parison divided by the number of compare operations performed 2 during comparison with RP RISM. 1 1 Results. Salient measurements from our experiments are shown in Fig. 14. Most traces were between 10K and 100K entries, with 0 0 0.5x 1x 5x 10x 50x 100x 500x 1000x 2500x 5000x a few outliers ranging up to 1.9 million entries. Trace size was 99% 100% 105% 110% 125% 150% 200% A ccu ra cy (R P rism v s L C S ) Speedup (RPrism vs LCS) optimized by leveraging AspectJ pointcuts to exclude the internal workings of unrelated code, such as libraries and data structures. (a) Accuracy (b) Speedup RP RISM organizes contiguous sets of differences into difference se-Figure 14. Quantitative results based on iBugs Rhino dataset com- quences, thereby organizing tool output into comprehensible units.paring RP RISM with an optimized LCS implementation. More than two-thirds of the bugs produced less than 50 difference sequences, with the remainder ranging from 50 to 130 difference sequences. We found that the cause of the divergence is often ob-5.1 Quantitative Assessment served in the first handful of differences sequences. If further refine- ments are required, an alternate but non-regressing test case can beOur goal is to assess both (i) the semantics-aware trace differenc- created, and then the number of difference sequences is typicallying, and (ii) the regression-cause analysis. Existing research bug cut by an order of magnitude or more (see next section).databases contain few if any real regressions (focusing on explicitbugs instead). Without any immediately suitable benchmark regres- We calculated the precise LCS where possible using an optimizedsion bug database, we built on the Rhino dataset in the iBUGS version of the LCS algorithm (common-prefix/suffix optimiza-project [5]. The Rhino dataset is a set of 29 bugs from the Mozilla tions). Fig. 14(a) measures accuracy by comparing the relativeRhino project,6 which implements JavaScript in Java and consists number of trace entries marked as different by each approach.of 304 KLOC (including tests), 242 classes, and 15K tests. Rhino RP RISM achieves greater than 100% accuracy in all but 3 casescompiles JavaScript into an intermediate form, which is then ei- because it is able to make semantically correct correlations (suchther interpreted or compiled to standard Java classes. Our data here as detecting moved trace entries) that the LCS is inherently not ableis based on the interpretive mode, as it produced longer and more to detect. The remaining 3 cases had accuracy greater than 99%.complex traces, but RP RISM runs equally well with the compiled We evaluate RP RISM’s efficiency by measuring speedup relative tomode. the number of compare operations (Fig. 14(b)). The LCS approach failed on traces longer than 100K entries (due to memory exhaus-We integrated RP RISM into the automated build process, provid- tion), whereas RP RISM successfully analyzed traces as long as 1.9ing unattended runs over all bugs. We introduced regressions into million entries. RP RISM achieved speedups of more than 100x vseach post-fix version by either using the actual cause of the bug it- the LCS algorithm. For two very small traces RPrism had speedupsself if the bug was a regression or by using a distribution of root less than 1x, because of the extra comparisons in secondary views.causes that matches the distribution found for semantic bugs in the Observed wall clock times for trace differencing are also reason-Mozilla project in an empirical study [13]. Root causes considered able – running time of the algorithm took less than 1 second in allare categorized as missing features (26.4%), missing cases (17.3%), cases, except for the trace with 1.9 million entries, which took 110boundary conditions (10.3%), control flow (16.0%), wrong expres- seconds.sions (5.8%), or typos (24.2%). We ensured that each injected re-gression caused the test case associated with the bug to fail. Note that the histograms include data for only 14 of the 29 bugs in the iBugs Rhino suite, for the following reasons: Two bugsWe utilized RP RISM to trace the working and regressing versions were not runnable due to a problem with the iBugs distribution.and to calculate the differences between the traces. As we are The LCS algorithm failed due to memory exhaustion for 3 bugs.modeling what RP RISM would provide to developers if it were part The remaining bugs had trouble generating tracing data, due toof a completely automated build process, we do not follow the final problems with the AspectJ weaver (invalid bytecode was producedstep of manually creating similar non-regressing test cases, which by the AspectJ weaver).would only increase accuracy further. A developer could completethis final step if they wished to further refine RP RISM output when 5.2 Real-life Regressionsfixing the regression. To assess RP RISM’s utility in identifying regression causes, we6 analyze four previously documented regressions in significantly
  10. 10. LCS-based Differencing Views-based Differencing Benchmark LOC Trace Tracing Num Diff. Regression False False Analysis Mem Num Diff. Regression False False Analysis Mem Speedup Entries Secs. Diffs. Seqs. Diff. Seqs. Pos. Neg. Secs. (GB) Diffs. Seqs. Diff. Seqs. Pos. Neg. Secs. (GB) Daikon 169K 15K 185 352 43 3 0 1 44 0.85 179 42 3 0 1 3.4 0.07 12.9x Xalan-1725 365K 98K 90 2,338 145 0 0 1 1,515 26.9 1,197 296 1 0 0 18.3 0.10 82.8x Xalan-1802 286K 44K 99 4,269 117 11 0 0 582 7.43 3,602 184 10 0 0 61.5 0.21 9.4x Derby-1633 720K 337K 182 (out of memory failure at 32GB) 125,562 2,663 6 4 0 80.1 0.34 - Table 1. Benchmark and analysis characteristics (time/memory are median of 3 runs). Number of Views Size of Analysis Sets Benchmark Total views Thread views Method views Target object views A B C D Daikon 559 1 127 431 42 NA 22 3 Xalan-1725 1,679 1 446 1,232 296 243 113 1 Xalan-1802 1,811 1 497 1,313 184 183 10 10 Derby-1633 6,874 3 1,761 5,110 2,663 4 310 10Table 2. Number of views (in the original program version only) and size of the sets in the regression-cause analysis algorithm for ourbenchmarks. Set A is the suspected differences set, set B is the expected differences set, set C is the regression differences set, and set D isthe result of the analysis.sized open-source software projects, namely Daikon [6], Apache Daikon. Daikon is an extensible tool for dynamically detectingXalan7 (2 regressions), and Apache Derby.8 Our reasons for choos- likely program invariants (169 KLOC, 1100 classes). We revisiting these regressions are as follows: The Daikon regression was a regression first considered in the evaluation for JUnit/CIA [17].chosen because this exact same regression was also evaluated by This regression is exhibited by an outdated version of the testXorJUnit/CIA [17], providing a comparison to a previously established test case. The regression was caused by changes in two meth-method for regression-cause analysis. The first Xalan regression ods in class daikon.diff.XorVisitor, namely shouldAddInv1 andwas chosen because it involved an extreme separation of cause and shouldAddInv2 [17]. The older testXor version was selected as theeffect, as the cause is within a dynamic bytecode compiler and the regressing test case and the newer testXor version was selected asvisible effects are not exhibited until the compiled bytecode is exe- the non-regressing test case. Note that Daikon took the longest tocuted. The second Xalan regression was chosen because the cause trace even though it produced the shortest traces because the testof the regression was in a completely rearchitected module in the case involved the most number of distinct classes, resulting in 98%code, and exhibited a large amount of code churn in general (79K of the time for tracing spent performing aspect weaving.lines); we wanted to observe how this level of code churn would Out of 42 difference sequences, RP RISM identified 3 of these asaffect accuracy. The Derby regression was chosen because it in- relating to the regression; 2 of these differences correctly identi-volved multiple threads, a larger code base (2x), and offered larger, fied changes in the shouldAddInv2 method as the regression cause,longer-running traces (3x) than the other regressions. although the change in shouldAddInv1 was not identified (a falseThe versions to use when analyzing each regression were deter- negative). The other difference was related to the effect of the re-mined as follows: With Daikon, we used the versions as published gression but not the causes. Notably, RP RISM was more precisein the evaluation of JUnit/CIA. For the others, we chose the last than JUnit/CIA for this regression. Whereas JUnit/CIA also cor-publicly released version that was working correctly, and the first rectly identified the changed methods that caused the regression,publicly released version after the regression was reported but not it also labeled 2 other changes as “Red” (highly likely to be theyet fixed (at least 5 months between the versions chosen in all cause), and 31 other changes as “Yellow” (changes that might be acases). cause) [17].Table 1 summarizes characteristics of the benchmarks and our anal-ysis results. For comparison, we present the results of the regression Apache Xalan. Xalan is an implementation of XSLT, an XMLanalysis based on both the LCS-based and view-based differenc- transformation language (365 KLOC, 1500 classes). We considering semantics. Num Diffs. states the number of distinct differences a bug from Xalan’s bug database, XALANJ-1725,9 involving aidentified. Diff. Seqs. states how many difference sequences (each regression from version 2.5.1 to 2.5.2. Version 2.5.2 incorporatesrepresenting one higher-level semantic difference that manifests as 4 months of code changes, including 84 feature enhancements anda contiguous set of differences) were formed from the raw differ- bug fixes. The cause of the regression is quite difficult to pinpointences. Regression Diff. Seqs. states how many of these sequences because the bug is in the XSLT compiler (which generates Javawere identified by RP RISM as being regression related. False Pos. bytecode). This is an extreme case of separation of cause and effect,states the number of semantic differences incorrectly identified as as the former lies in incorrectly generated bytecode, so the latterregression-related by RP RISM; False Neg. states the number of only manifests upon execution of that bytecode. This makes itregression-inducing differences (as identified by the developers in extremely difficult for any static analysis technique to accuratelythe bug description) that exist between the non-regressing and re- understand and identify precisely both cause and effect.gressing versions that were not identified. Speedup is based on wall The bug report provided an XSLT file that was correct on versionclock time of the differencing analysis. Table 2 summarizes the 2.5.1 but not on version 2.5.2. To obtain a similar non-regressingnumber of views and the sizes of the regression analysis sets. Note test case, we modified the XSLT file and removed the small sectionthat the contents of sets A, B, C, and D can all be very different, of the file that was causing incorrect behavior while leaving thewhich is why |D| can be much smaller than |C| and why |D| can remainder of the file the same, and it was constructed withoutbe larger than |A| − |B|. foreknowledge of the regression cause. Alternatively, automated techniques could be applied to construct the alternate test case [22].7 9