A Logic Meta-Programming Foundation for Example-Driven Pattern Detection in Object-Oriented Programs


Published on

Presentation at the Postdoctoral symposium of the 2011 International Conference on Software Maintenance, accompanying the paper

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

A Logic Meta-Programming Foundation for Example-Driven Pattern Detection in Object-Oriented Programs

  1. 1. A Logic Meta-Programming Foundation forExample-Driven Pattern Detectionin Object-Oriented ProgramsPost-doctoral Symposium TrackInternational Conference on Software Maintenance, 29/09/2011Colonial Williamsburg, VA (USA)Coen De Roover Promotors:Software Languages Lab Wolfgang De MeuterVrije Universiteit Brussel Johan Brichau
  2. 2. General-Purpose Pattern Detection Tools identify code of which the user specified the characteristics e.g. structural e.g. control flow, data flow class Ouch { scanner = new Scanner(); int hashCode() { ... return ...; scanner.close(); } ... } scanner.next(); 2Let me explain the title first. Given its length, this will take a slide or two. First off, what do I consider a “general-purpose pattern detection tool”? Well, that’s any toolthat identifies code of which the user has specified the characteristics. Those characteristics can be related to the structure of a program, but also its control flow anddata flow. The slide illustrates the difference. Consider possible violations of the invariant that equal objects have equal hash codes. Those are characterizedstructurally by a method hashCode(), without a corresponding equals() method. Reads from a closed scanner, on the other hand, are characterized by control flowcharacteristics (next is invoked after close), and data flow characteristics (both on same scanner). Now, if such a tool existed, it could be used for a variety ofpurposes. For instance, to check whether the protocol of an in-house API is used correctly. Or to check whether an application-specific bug you just discovered, isn’tmore widespread. Or even to check whether someone is instantiating a class for which a factory method exists. In short, it could be used to detect a lot of interestingapplication-specific patterns ... not just design patterns :)
  3. 3. Example-driven Detection user class ?name extends Component { public void acceptVisitor(?type ?v) { System.out.println(?string); ?v.?visitMethod(this); } } tool public class OnlyLoggingLeaf extends Component { public void acceptVisitor(ComponentVisitor v) { System.out.println("Only logging."); } } public class SillyLeaf extends OnlyLoggingLeaf { public void acceptVisitor(ComponentVisitor v) { super.acceptVisitor(v); ComponentVisitor temp = v; temp.visitSuperLogLeaf(this); } } 0.648 3Of course, if such a general-purpose pattern detection tool existed, you would still have to tell it somehow what patterns to look for. Wouldn’t it be great if you couldjust give the tool an example implementation of the pattern you are looking for, and have the tool return all variants of this example in your code? In a nutshell, that’swhat I proposed in my dissertation: an example-driven approach to pattern detection. So, how does it work in practice? Imagine that you want to check whether allsubclasses of a Component class, have a method acceptVisitor that logs something before double dispatching to its parameter. Then you give the tool an exampleimplementation as shown on the slid. It looks like regular Java code with meta-variables. For instance, the one in green substitutes both for the parameter of themethod and the receiver of the double dispatching. Each result reported by the tool consists of bindings for the meta-variables. One of those results is shown on thebottom of the slide. As you can see, this particular result is a variant of the implementation we gave to the tool. Here, the acceptVisitor() method performs therequired logging through a super call instead of directly. It also doesn’t dispatch to its parameter, but to a temporary variable that aliases the parameter. The more ofthese variants the tool finds, the better. However, not all variants of the implementation are equal. That’s why an example-driven pattern detection tool ranks thevariants it finds based on their similarity to the given example.
  4. 4. Motivation uniform language for specifying behavioral and structural characteristics existing specification languages are too specialized e.g. temporal logic formulas over a control flow graph constraint satisfaction problem over AST nodes familiar to developers often communicate using example snippets and diagrams recalls implicit implementation variants through static analyses relieves developers from having to enumerate each variant shields developers from intricate analysis results facilitates assessing reported variants e.g. present in one, all, or some program executions 4So, why did I investigate such an example-driven approach to pattern detection? First of all, code templates provide a uniform language for specifying the behavioraland structural characteristics of a pattern. Existing tools, in contrast, are tailored to one kind of characteristic. For instance, tools specialized in cflow chars mighthave you express cflow chars using temporal logic formulas over a cflow graph. Tools specialized in structural characteristics, on the other hand, might have youexpress a constraint satisfaction problem over ast nodes. Clearly, the differences between such highly specialized languages make it difficult to specifyheterogeneously characterized patterns. Second, code templates align well with the way developers tend to communicate: through example snippets and diagrams.Third, having the tool recall implicit variants of the exemplified characteristics relieves user from having to enumerate all of them in a specification. It also shieldsusers from the intricate program analyses that are needed to recall these variants. Fourth, not all variants of behavioral characteristics are created equal. Some showup in only one, in some, or in all program executions. Ranking these variants should facilitate assessing them.
  5. 5. In the dissertation ... 5Now, motivating all of this took a lot longer in the actual dissertation. There, I discussed the dimensions in the design of a pattern detection tool, surveyed theexisting tools on these dimensions, concluded that there was a need for a general-purpose tool, motivated a set of desiderata for such a tool, and evaluated theexisting tools on these desiderata. I then used this evaluation to motivate the cornerstones of my approach. Today, I’ll briefly discuss 3 of them: logic meta-programming, example-driven matching of code templates and domain-specific unification.
  6. 6. Founding Cornerstone: LMP specify characteristics through logic queries leave operational search to logic evaluator quantify over reified program representation AST, CFG, PTA ✓ expressive, declarative, ... ✘ exposes users to details of representation + reification 6Logic Meta Programming is the founding cornerstone of my approach. It advocates specifying a pattern’s characteristics through logic queries, and leaving theoperational search for the pattern’s instances to the logic evaluator. Which is, of course, a good thing. You can read the query on the slide as “give me a class thatdeclares a method named “foo” or one that is named “bar”. LMP is something that has already been around for a decade or two. It has been used to detect patternsin ASTs, CFGs, and even PTA results. It is very expressive, but it exposes users to the details of such a representation and the way it was converted into a logicformat.
  7. 7. Cornerstone: Example-Driven Specification exemplify characteristics through code templates embedded in logic queries if jtClassDeclaration(?class) { }, ?method methodDeclarationHasName: ?visitMethod matched according to multiple strategies vary in leniency from AST-based to flow-based recall implicit variants of structural and control flow characteristics { super.acceptVisitor(v); ✓ { System.out.println(“Hello”); ✘ x.doSomething(); ComponentVisitor temp = v; v.visitSuperLogLeaf(this); } temp.visitSuperLogLeaf(this); } 7So, LMP is expressive, but difficult to use. The second cornerstone of my approach therefore advocates exemplifying pattern characteristics through code templatesinstead. By embedding templates in logic queries, they can be combined through logic connectives and multiple occurrences of the same meta-variable. On theslide, you can see a logic query that consists of two conditions. The first condition corresponds to our example implementation of the Component subclass. Thesecond condition is not a template, but an ordinary logic condition. They are connected through the occurrences of the purple meta-variable. As a result, we will alsofind the method invoked by the visitXXX message. Now, code templates are nothing new in pattern detection tools. What is new, however, is that we match these inan example-driven manner according to multiple strategies. These strategies vary in leniency from strict AST-based (which is the predominant one among patterndetection tools) to a very lenient flow-based matching. The idea is that these strategies recall variants of sturctural and control flow chars that are implied by thesemantics of the programming language. For instance, even an indirect subclass of Component with the acceptVisitor method on the left would be recognized as avariant of our implementation. It corresponds to the example, except that it contains additional instructions and performs the exemplified logging through a supercall. However, to recall the pattern instance on the right, our tool needs to be able to recognize implicit variants of data flow characteristics. This brings us to the lastcornerstone I will discuss.
  8. 8. Cornerstone: Domain-Specific Unification extensions ensure that implicit implementation variants unify class MustAlias extends Component { class ?name extends Component { public void acceptVisitor(ComponentVisitor v) { public void acceptVisitor(?type ?v) { System.out.println(“Hello”); System.out.println(?string); ComponentVisitor temp = v; ?v.?visitMethod(this); temp.visitSuperLogLeaf(this); } } } } consults static analyses AST node AST node identical 1 likelihood of resulting in false positive, Qualified Type Simple Type denote same or co-variant return types 1 propagated by fuzzy logic cornerstone Expression Expression in must-alias or may-alias relation 0.9 or 0.5 Message Name Method Name message may invoke receiver type to dynamic or static method according 0.5 or 0.4 ... 8The domain-specific unification cornerstone consists of domain-specific extensions to the regular unification procedure we know from Prolog. It ensures that implicitimplementation variants unify. In the code on the right, the first occurrence of the green variable v is bound to a parameter. The second occurrence of v is bound toa temporary variable. The dsu allows this because v and temp happen to evaluate to the same value at run-time. To determine this, the dsu consults staticanalyses.The table on the slide lists some other unification extensions. The one in the second row, unifies a qualified type with a simple type if both denote the sametype or are co-variant return types. To this end, it consults a semantic analysis. The name of a message and the name of a method also unify if the message mayinvoke the method according to the static or the dynamic type of the receiver.As an extension may succeed where the plain uni proc fails, it might result in false positives. We therefore associate unification degrees with each extension. Theyare shown in the last column. For instance, two expression unify with a degree of 0.9 if they alias in every program execution, but with a degree of 0.5 if they aliasonly in some. All of these degrees are combined and used to compute the ranking of a detected result.
  9. 9. In Practice: Detecting Lapsed Observers 9On to some practice. The paper discusses how to detect possible lapsed observers in an example-driven manner.Those are observers that are added to a subject, but never removed.
  10. 10. tion Name Example-driven Specification 1 class 2 pr 1 if jtClassDeclaration(?subjectClass){ subject 2 class ?subjectName { 3 pu class 3 ?mod1List ?t1 ?observers = ?init; 4 5 } 4 public ?t2 ?addObserver ( ?observerType ?observer ) { add 6 pu 5 ?observers .?add( ?observer ); method 7 6 } 8 } 7 public ?t3 ?removeObserver( ?observerType ?otherObserver) { 9 pu 8 ?observers .?remove(?otherObserver); 10 9 } 11 10 ?mod2List ?t4 ?notifyObservers(?param1List) { 12 11 ?observers ; 13 } 12 ?observer . ?update (?argList); update 14 } 13 } message 15 class 14 } 16 publ 15 }, 17 } 18 class 16 jtClassDeclaration(?observerClass){ observer 17 class ?observerName { 19 20 publ Po class 18 ?mod3List ?t5 ?update (?argList) {} update Sc method 21 19 } 20 }, 22 Sc 23 p. lapsed 21 jtExpression(?register){ ?subject. ?addObserver ( ?lapsed ) }, add 24 message p. .. observer 22 not(jtExpression(?unregister){ ?subject.?removeObserver( ?lapsed ) }), 25 26 p. instance 23 jtExpression(?alloc){ ?lapsed := new ?observerName (?argList) } instance } 27 creation 28 } Rest assured, there is some reason to this madness. I’ve simply highlighted all occurrences of a variable in the same color. The specification for the lapsed observer Fig. 2. Domain-specific extensions of the unification procedure illustrated on the d consists of 3 parts. The first is a template that exemplifies the prototypical implementation of the subject class. Among others, it has a method addObserver (in orange) for registering an observer with the subject. It takes the observer as its parameter (in blue) and adds the observer to the purple field. It also has a notifyObservers method that notifies observers of state changes. Note that it sends a message (in yellow) to one of the previously added observers in blue). The second part exemplifies an observer class. It is exemplified as a class in which the method invoked by the yellow update message resides. The lapsed observer instance is found as the gray argument to an addObserver invocation, which is never used as an argument to any removeObserver invocation. The last line finds the expression that created this observer instance.bjectClass and ?observerClass). Lines 21–23 exemplify using thlapsed listener pitfall at the instance-level: as instances approach
  11. 11. 9 1 tantiates declared l must-alias analysis or according to an inter-procedural may- 10 or 2 Name expression Name ate to the same object at run-time Example Instance an Expression an Expression according to an intra-procedura 9 alias analysis, must or may evalu e according to a semantic analysis 10 an Expression a variable declara- expression references the variab tion Name 1 class Point implements ChangeSubject { 1 if jtClassDeclaration(?subjectClass){ 2 private HashSet observers ; 2 class ?subjectName { 3 ?mod1List ?t1 ?observers = ?init; 3 public void addObserver ( utils.ChangeObserver o) { 4 public ?t2 ?addObserver ( ?observerType ?observer ) { ?observers .?add( ?observer ); 4 observers .add( o ); 5 6 } 5 } 7 public ?t3 ?removeObserver( ?observerType ?otherObserver) { 6 public void removeObserver( ChangeObserver o) { 8 ?observers .?remove(?otherObserver); 1 9 } 7 this.observers .remove(o); 10 ?mod2List ?t4 ?notifyObservers(?param1List) { 1 ?observers ; 1 8 } 11 1 12 ?observer . ?update (?argList); 9 public void notifyObservers() { 13 } 1 1 0 for (Iterator e = observers .iterator() ; e.hasNext() ;) { 14 } 1 15 }, 1 1 ((ChangeObserver)e.next()) . refresh (this); 1 16 jtClassDeclaration(?observerClass){ 2 } 17 class ?observerName { 1 2 3 } 18 ?mod3List ?t5 ?update (?argList) {} 2 4 } 19 } 2 20 }, 5 class Screen implements ChangeObserver { 2 21 jtExpression(?register){ ?subject. ?addObserver ( ?lapsed ) }, 2 6 public void refresh (ChangeSubject s) { ... } 2 22 not(jtExpression(?unregister){ ?subject.?removeObserver( ?lapsed ) }), 7 } 2 23 jtExpression(?alloc){ ?lapsed := new ?observerName (?argList) } 2 8 class Main { 2 9 public static void main(String[] args) { 0 Point p = new Point(5, 5); Fig. 2. Domain-specific extensions of the unification procedure illustrated 1 Screen s1 = new Screen ("s1") ; qualified type simple type 2 Screen s2 = new Screen("s2"); ?subjectClass and ?observerClass). Lines 21–23 exemplify us 3 4 p. addObserver ( s1 ); p.addObserver(s2); the lapsed listener pitfall at expression the instance-level: expression as instances ap 5 ... of the participating classes that exhibit the characteristics of expression parameter name th 6 p.removeObserver(s2); the pitfall. They identify ?lapsed objects that are added to a pu } ?subject (line 21), but never removed from it message name method name 7 8 } (line 22). The final condition term is optional. It identifies the expression class name simple type an that instantiated the lapsed object. To this end, it uses the 11 div on the detection of lapsed listeners in the Observer design pattern. unifies the logic non-native operator := which variable on ar Here’s an example of a lapsed listener, together with the unification extensions that were required to find it. The paper has all the details. its left-hand side with the AST node that matches the code ter on ing the desiderata it helps to fulfill.newits right-hand side. the depicted program. be bound to Screen("s1") for As a Next, we evaluatedresult, ?alloc will our or aspproach as a whole on these desiderata bythat the depicted specification only detects possible Note detecting patterns of lapsed at are representative for the intended listeners.of doesan observer theno longerthe program’s execution after It use which not identify is point in needed, a general- the nor prurpose tool: design patterns, µ-patterns it specify that the ?unregister expression should be does and bug patterns. de
  12. 12. Evaluation result in general-purpose detection tool for structural and behavioral characteristics using descriptive specifications in a uniform language motivated each cornerstone through desiderata it helps to fulfill using running examples for each kind of characteristic approach as a whole on desiderata using design patterns, µ-patterns, bug patterns ✓ descriptive specifications ✓ most instances recalled with few false positives ✘ cardinality constraints difficult to exemplify 12Now, how do you evaluate such a thing? I needed to evaluate the approach as a whole on the desiderata for a general-purpose pattern detection tool, but I also hadto motivate its individual cornerstones. I therefore enabled the cornerstones one by one in my tool to demonstrate what desiderata they help to fulfill. The approachas a whole was evaluated by detecting instances of representative design patterns, micro-patterns and bug patterns. And of course, it worked well. A lot ofspecifications consisted solely of Java code with meta-variables. Only cardinality constraints such as “at least as many as” are easier to express using plain LMP.
  13. 13. Future Work what to rank pattern detection results on? similarity + imprecisions in analyses severity for bug patterns? need a corpus of programs in which pattern instances have been documented pattern specification formalisms that are even easier to use generalizing pattern instances into example-driven specifications search space exploration backed by program analyses in general, make sure that our tools become part of every developer’s toolbox e.g. example-driven program transformation e.g. example-driven history querying but ... maybe ease-of-use is not the only adoption hurdle 13What do I consider future work? First of all, I’m currently ranking results based on their similarity to the given example and on the imprecisions in the analyses thatwere needed to find each result. That seems ok for several patterns, perhaps except for bug patterns. Specialized bug detection tools rank bugs based on theirseverity. So there is room for future work here, although yesterday’s keynote speaker seems to disagree :) In any case, to evaluate our tools, we need a corpus ofprograms in which pattern instances have been documented. That’s a tremendous task, but someone has to do it. Perhaps we can do it collaboratively using a socialwebsite. There is also room for other specification formalisms that are even easier to use. Currently, I’m interested in search-based techniques from artificialintelligence to automatically generalize snippets of code into an example-driven specification. In general, I believe a lot of work still needs to be done to ensure ourtools become part of every developer’s toolbox. I’m thinking of specifying program transformations in an example-driven way or even querying the history of aprogram in an exemple-driven way. But maybe ease-of-use is not the only hurdle to the adoption of our tools. Empirical studies are needed to determine what iskeeping our tools from the toolboxes.
  14. 14. g hly hi ub jec tive Lessons for Doctoral Students s proponent of artifact-driven research share with others, gain momentum specification of artifact for reproducibility (in my case: meta-interpreters) stand on shoulder of giants ... or reinvent the wheel ? SOUL, JDT, SOOT: thanks! but, often takes implementing an algorithm to understand its details be wary of analysis paralysis trust your advisors when they say you have enough material ;) anonymous: “getting a PhD is akin to getting a driver’s license for doing research” 14Since this is the post-doctoral symposium, here are some of the lessons I learned that could be of use to doctoral students. Warning: these are highly subjective andpersonal.
  15. 15. download@soft.vub.ac.be/SOUL/