SlideShare a Scribd company logo
1 of 135
raybuse                                PhD Defense
                                         4.17.2012

       Automatically Describing
    Program Structure and Behavior




Readability   Runtime Behavior   Documentation



                                                 1
Code is Difficult to Understand.
Write New Code


                                         Change Code




                Read Code

“Understanding code is by far the activity at which
professional developers spend most of their time.”
      Peter Hallam. What Do Programmers Really Do Anyway?
      Microsoft Developer Network (MSDN) – C# Compiler. Jan 2006


                                                                   3
Requirements
                                                             Maintenance accounts for about
                        Design                                70-90% of the total lifecycle
                            Implementation
                                                                budget of a software project.

                                             Verification




                                                            Maintenance


T. M. Pigoski. Practical Software Maintenance: Best Practices for Managing Your Software Investment.
R. C. Seacord, D. Plakosh, and G. A. Lewis. Modernizing Legacy Systems: Software Technologies,



                                                                                                       4
#include                                     <math.h>
#include                                   <sys/time.h>
#include                                   <X11/Xlib.h>
#include                                  <X11/keysym.h>
                                          double L ,o ,P
                                         ,_=dt,T,Z,D=1,d,
                                         s[999],E,h= 8,I,
                                         J,K,w[999],M,m,O
                                        ,n[999],j=33e-3,i=
                                        1E3,r,t, u,v ,W,S=
                                        74.5,l=221,X=7.26,
                                        a,B,A=32.2,c, F,H;
                                        int N,q, C, y,p,U;
                                       Window z; char f[52]
                                    ; GC k; main(){ Display*e=
 XOpenDisplay( 0); z=RootWindow(e,0); for (XSetForeground(e,k=XCreateGC (e,z,0,0),BlackPixel(e,0))
; scanf("%lf%lf%lf",y +n,w+y, y+s)+1; y ++); XSelectInput(e,z= XCreateSimpleWindow(e,z,0,0,400,400,
0,0,WhitePixel(e,0) ),KeyPressMask); for(XMapWindow(e,z); ; T=sin(O)){ struct timeval G={ 0,dt*1e6}
; K= cos(j); N=1e4; M+= H*_; Z=D*K; F+=_*P; r=E*K; W=cos( O); m=K*W; H=K*T; O+=D*_*F/ K+d/K*E*_; B=
sin(j); a=B*T*D-E*W; XClearWindow(e,z); t=T*E+ D*B*W; j+=d*_*D-_*F*E; P=W*E*B-T*D; for (o+=(I=D*W+E
*T*B,E*d/K *B+v+B/K*F*D)*_; p<y; ){ T=p[s]+i; E=c-p[w]; D=n[p]-L; K=D*m-B*T-H*E; if(p [n]+w[ p]+p[s
]== 0|K <fabs(W=T*r-I*E +D*P) |fabs(D=t *D+Z *T-a *E)> K)N=1e4; else{ q=W/K *4E2+2e2; C= 2E2+4e2/ K
 *D; N-1E4&& XDrawLine(e ,z,k,N ,U,q,C); N=q; U=C; } ++p; } L+=_* (X*t +P*M+m*l); T=X*X+ l*l+M *M;
  XDrawString(e,z,k ,20,380,f,17); D=v/l*15; i+=(B *l-M*r -X*Z)*_; for(; XPending(e); u *=CS!=N){
                                   XEvent z; XNextEvent(e ,&z);
                                       ++*((N=XLookupKeysym
                                         (&z.xkey,0))-IT?
                                         N-LT? UP-N?& E:&
                                         J:& u: &h); --*(
                                         DN -N? N-DT ?N==
                                         RT?&u: & W:&h:&J
                                          ); } m=15*F/l;
                                          c+=(I=M/ l,l*H
                                          +I*M+a*X)*_; H
                                          =A*r+v*X-F*l+(
                                          E=.1+X*4.9/l,t
                                          =T*m/32-I*T/24
                                           )/S; K=F*M+(
                                           h* 1e4/l-(T+
                                           E*5*T*E)/3e2
                                           )/S-X*d-B*A;
                                           a=2.63 /l*d;
                                           X+=( d*l-T/S
                                            *(.19*E +a
                                            *.64+J/1e3
                                            )-M* v +A*
                                            Z)*_; l +=
                                            K *_; W=d;
                                            sprintf(f,
                                            "%5d %3d"
                                            "%7d",p =l
                                           /1.7,(C=9E3+
                              O*57.3)%0550,(int)i); d+=T*(.45-14/l*
                             X-a*130-J* .14)*_/125e2+F*_*v; P=(T*(47
                             *I-m* 52+E*94 *D-t*.38+u*.21*E) /1e2+W*
                             179*v)/2312; select(p=0,0,0,0,&G); v-=(
                              W*F-T*(.63*m-I*.086+m*E*19-D*25-.11*u
                               )/107e2)*_; D=cos(o); E=sin(o); } }
                                                                                                      6
7
Java
What does this print?
class Change {
  public static void main(String[] args) {
      System.out.println(2.00 – 1.10);
    }
}




                      *Adapted from Josh Bloch, Jeremy Manson

                                                            9
What does this print?
class Change {
  public static void main(String[] args) {
      System.out.println(2.00 – 1.10);
    }
}




    Output: 0.8999999999999999

                                             10
What does this print?
import java.math.BigDecimal;

class Change {
  public static void main(String[] args) {
    BigDecimal payment = new BigDecimal(2.00);
    BigDecimal cost = new BigDecimal(1.10);
    System.out.println(payment.subtract(cost));
  }
}




                                             11
What does this print?
import java.math.BigDecimal;

class Change {
  public static void main(String[] args) {
    BigDecimal payment = new BigDecimal(2.00);
    BigDecimal cost = new BigDecimal(1.10);
    System.out.println(payment.subtract(cost));
  }
}

 Output: 0.899999999999999911182158092
99874766109466552734375
                                             12
BigDecimal
public BigDecimal(double val)

Translates a double into a BigDecimal which is the exact
decimal representation of the double's binary floating-
point value. The scale of the returned BigDecimal is the
smallest value such that (10scale × val) is an integer.




 http://docs.oracle.com/javase/6/docs/api/java/math/BigDecimal.html


                                                                      13
What we should have done
import java.math.BigDecimal;

class Change {
  public static void main(String[] args) {
    BigDecimal payment = new BigDecimal(“2.00”);
    BigDecimal cost = new BigDecimal(“1.10”);
    System.out.println(payment.subtract(cost));
  }
}


   Output: 0.90
                                             14
Confusing
Hard to Read


                           15
The Rest of this Talk
Modeling Code Readability      Predicting Runtime Behavior




                 Synthesizing Documentation




                                                       16
Hard to Read


               17
How can we tell if code is readable?
19
“…
I do not like them in a box.
I do not like with a fox.      "There once lived, in a sequestered
I do not like them in a        part of the country of Devonshire, one
                               Mr. Godfrey Nickleby: a worthy
house.                         gentleman, who, taking it into his
I do not like them with a      head rather late in life that he must
mouse.                         get married, and not being young
                               enough or rich enough to aspire to the
…”                             hand of a lady of fortune, had wedded
                               an old flame out of mere attachment,
                               who in her turn had taken him for the
                               same reason."



                                                                  20
21
Flesch-Kincaid Readability




                             22
Flesch-Kincaid Readability




                             23
Flesch-Kincaid Readability
                                   15.3
0.0




                                          24
Flesch-Kincaid Readability
                     10.0              15.3
0.0




                    DOD MIL-M-38784B




                                              25
Flesch-Kincaid Readability
                     10.0              12.1   15.3
0.0




                    DOD MIL-M-38784B




                                                     26
Can this work for code?
Research questions:
• To what extent do humans agree on
  what code is readable?
• Can we derive a accurate descriptive
  model for readability?
• Does the model correlate significantly
  with software quality?
28
Vertical bands
           imply agreement
 More
Readable




  Less
Readable


                       29
Quantifying Agreement




                        31
Quantifying Agreement

        Correlation Statistics
        • Pearson’s r – linear dependence
        • Spearman’s ρ – monotonic
          dependence
        • Kendall’s τ – counts bubble sort
          operations
        • Cohen’s κ – nominal agreement



                                             32
Quantifying Agreement

        Correlation Statistics
        • Pearson’s r – linear dependence
        • Spearman’s ρ – monotonic
          dependence
        • Kendall’s τ – counts bubble sort
          operations Absolute agreement
        • Cohen’s κ – nominal agreement
                        less important than
                       relative agreement


                                            33
Quantifying Agreement

             Perfect Absolute
             Agreement


             ρ=1


                                34
Quantifying Agreement

             Perfect Relative
             Agreement


             ρ=1


                                35
Quantifying Agreement

             Absolute
             Disagreement


             ρ = -1


                            36
Quantifying Agreement

             Random
             Agreement


             ρ=0


                         37
Quantifying Agreement

             “Strong”
             Agreement


             ρ > 0.5


                         38
Quantifying Agreement With a Group




                        …




                                     39
Quantifying Agreement With a Group




                                                        …
  Apples and Oranges: An Empirical Comparison of Commonly Used
  Indices of Interrater Agreement
  Allan P. Jones, Lee A. Johnson, Mark C. Butler and Deborah S. Main
  The Academy of Management Journal , Vol. 26, No. 3 (Sep., 1983), pp.
  507-519


                                                                         40
Quantifying Agreement With a Group



      Compute all
        pairwise                  …
      correlations




             ρ0 ρ1 ρ2 ρ3 … ρn-1
                                      41
Quantifying Agreement With a Group




                                        …


                                         Return
                                        Average ρ
      ρ = Σ( ρ0 ρ1 ρ2 ρ3 … ρn-1 ) / n
                                                    42
0.9
                       0.8
Spearman correlation
 between annotators

                       0.7
                       0.6
                       0.5
                       0.4
                       0.3
                       0.2
                       0.1
                        0
                             0   20       40        60       80        100   120
                                      Annotators sorted by agreement



                                                                                   43
Building a Model
Flesch-Kincaid Readability

         𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡              𝑠𝑦𝑙𝑙𝑎𝑏𝑙𝑒𝐶𝑜𝑢𝑛𝑡
0.39                   + 11.8                   − 15.59
       𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐶𝑜𝑢𝑛𝑡              𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡




                                                          45
Flesch-Kincaid Readability

         𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡              𝑠𝑦𝑙𝑙𝑎𝑏𝑙𝑒𝐶𝑜𝑢𝑛𝑡
0.39                   + 11.8                   − 15.59
       𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐶𝑜𝑢𝑛𝑡              𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡




f 𝑥 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽 𝑛 𝑥 𝑛




                                                          46
Flesch-Kincaid Readability

         𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡              𝑠𝑦𝑙𝑙𝑎𝑏𝑙𝑒𝐶𝑜𝑢𝑛𝑡
0.39                   + 11.8                   − 15.59
       𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐶𝑜𝑢𝑛𝑡              𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡




f 𝑥 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽 𝑛 𝑥 𝑛

              Features

                                                          47
Flesch-Kincaid Readability

         𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡              𝑠𝑦𝑙𝑙𝑎𝑏𝑙𝑒𝐶𝑜𝑢𝑛𝑡
0.39                   + 11.8                   − 15.59
       𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐶𝑜𝑢𝑛𝑡              𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡




f 𝑥 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽 𝑛 𝑥 𝑛

              Features              Weights

                                                          48
Features                  Weights

Creativity / Intuition   Supervised Learning
                            • Regression
                            • Bayesian
                            • Neural Net
                            • SVM
                            • …
                         Use training data from
                         human study




                                                  49
Potential Code Readability Features

class Change {

    //Computes 2.00 – 1.10
    public static void main(String[] args) {
      BigDecimal payment = new BigDecimal(“2.00”);
      BigDecimal cost = new BigDecimal(“1.10”);
      System.out.println(payment.subtract(cost));
    }
}



                                               50
Potential Code Readability Features

class Change {

    //Computes 2.00 – 1.10
    public static void main(String[] args) {
      BigDecimal payment = new BigDecimal(“2.00”);
      BigDecimal cost = new BigDecimal(“1.10”);
      System.out.println(payment.subtract(cost));
    }
}
                       Line Length

                                               51
Potential Code Readability Features

class Change {              Comments

    //Computes 2.00 – 1.10
    public static void main(String[] args) {
      BigDecimal payment = new BigDecimal(“2.00”);
      BigDecimal cost = new BigDecimal(“1.10”);
      System.out.println(payment.subtract(cost));
    }
}



                                               52
Potential Code Readability Features
                               Identifier
class Change {                  Length
    //Computes 2.00 – 1.10
    public static void main(String[] args) {
      BigDecimal payment = new BigDecimal(“2.00”);
      BigDecimal cost = new BigDecimal(“1.10”);
      System.out.println(payment.subtract(cost));
    }
}



                                               53
Potential Code Readability Features

class Change {
                     Blank Lines

    //Computes 2.00 – 1.10
    public static void main(String[] args) {
      BigDecimal payment = new BigDecimal(“2.00”);
      BigDecimal cost = new BigDecimal(“1.10”);
      System.out.println(payment.subtract(cost));
    }
}



                                               54
Evaluation
Model agrees with
                                            humans as much as they
                                             agree with each other
                       0.9
                       0.8
Spearman correlation
 between annotators




                       0.7
                                                                  our metric
                       0.6
                       0.5
                                                  average human

                       0.4
                       0.3
                       0.2
                       0.1
                        0
                             0   20    40        60        80     100      120
                                      Human Annotators (sorted)

                                                                               56
57
Line Length, # of
  identifiers is
   important




              58
Length of
identifiers is not




                     Line Length, # of
                       identifiers is
                        important




                                   59
Readability and Software Quality
Benchmarks
12 Open Source Sourceforge Projects, Over 2M LOC




                                                   61
Mature projects tend to
           be more readable




alpha   beta     stable   mature




                                   62
63
JUnit 3.0 -> 4.0




               64
Readability Metric
• Metric (and source code) is freely
  available:
  arrestedcomputing.com/readability
• Has been used directly in several
  published papers.




                                       65
Confusing
Hard to Read


                           66
0.0014
                 0.82




               Confusing
Hard to Read


                           67
0.82




Confusing



            68
Predicting Runtime Behavior
Approaches to Predicting Behavior
Dynamic Profiles        Static Heuristics
• Precise - full        • Cheap
  program path          • Only need program
  profiles                code
• Requires indicative   • Typically limited in
  workloads               scope


                                                 70
Static Heuristics
D. W. Wall. Predicting program behavior using real or estimated
profiles. In ACM Conf. on Programming Language Design and
Implementation (PLDI'91), pages 59-70, June 1991.


T. Ball and J. R. Larus. Branch prediction for free. In ACM Conf. on
Programming Language Design and Implementation (PLDI'93), pages
300-313, June 1993.


Boogerd, C. and Moonen, L. Prioritizing Software Inspection Results
using Static Profiling, Source Code Analysis and Manipulation, 2006.
SCAM '06. pp.149-160, Sept. 2006


                                                                       72
Intra-class static path profiles
• Precision similar to a dynamic profiler
• Workloads not required

                    HashTable
                        put()


                         rehash()

                          get()


                                            73
Key idea
public V put(K key , V value)
{
   if ( value == null )
     throw new Exception();

    if ( count >= threshold )
      rehash();

    index = key.hashCode() % length;

    table[index] = new Entry(key, value);
    count++;

    return value;
}
                    *from java.util.HashTable jdk6.0


                                                       75
public V put(K key , V value)
{
   if ( value == null )                 Exception
     throw new Exception();

    if ( count >= threshold )
      rehash();                                   Invocation that
                                                  changes a lot of
    index = key.hashCode() % length;               program state.

    table[index] = new Entry(key, value);
    count++;

    return value;              Computation
}
                    *from java.util.HashTable jdk6.0


                                                                 76
Static Path Profiling
Research questions:
• What static code features are predictive
  of path execution frequency?
• Can we derive a accurate descriptive
  model for runtime behavior?
Approach
• Build a descriptive model of path execution
  frequency
• Features: length of path, presence of
  exceptions, number of variables written …
• Train and cross-validate on SPEC Java
  benchmarks



                                                78
Choose 5% of all paths
and get 50% of runtime
       behavior          Ranking by our
                         metric




                         Baseline: random
                         ranking




                                            79
Ranking by our
                        metric
 Choose 1 path per
method and get 94% of
  runtime behavior
                        Baseline: random
                        ranking




                                           80
Applications for Profiles
• Profile guided optimization
• Complexity/Runtime estimation
• Anomaly detection
• Significance of difference between program
  versions
• Prioritizing output from other analyses
• Documentation

                                               81
Conclusion
• A formal model that statically predicts relative
  dynamic path execution frequencies
• The promise of helping other program
  analyses and transformations




                                                     82
0.82




Confusing



            83
Documentation Synthesis
Classic Approaches to
             Understandability
     Improving           Compensating
•   Code Reviews      • Testing
•   Training          • Verification
•   Languages         • Other Program
•   Documentation       Analyses



                                        85
Classic Approaches to Understandability
      Improving                     Compensating
 •   Code Reviews             • Testing
 •   Training                 • Verification
 •   Languages                • Other Program Analyses
 •   Documentation                         Doesn't solve
                                         underlying problem




                  Labor Intensive
                                                       86
Use these tools…




                   87
Improving
                   • Code Reviews
                   • Training
                   • Languages
                   • Documentation

                        To do this.
Use these tools…




                                      88
The most significant barrier to code
 reuse is “software is too difficult to
understand or is poorly documented.”


NASA Software Reuse Working Group. Software Reuse Survey.
http://www.esdswg.com/softwarereuse/Resources/library/working_group_documen
ts/survey2005
Source Code Documentation
• Describes some important aspect of the code
  in a way that’s easier to understand.

•   Explanations/Summaries of Behavior
•   Pre/Post Conditions, Caveats
•   Usage Examples
•   …

                                                90
Automatic Documentation




               Synthesis Tool
Program Code                    Documentation



                                           91
Automatic Documentation
•   Cheap
•   Always up-to-date
•   Complete
•   Well-defined trust properties
•   Structured (Searchable)



                                    92
Exceptions
IllegalStateException thrown when
        getLocation() is not Europe
Raymond P. L. Buse and Westley R.Weimer. Automatic Documentation
Inference for Exceptions. In International Symposium on Software                   Program Changes
Testing and Analysis, Seattle, WA, USA, 2008.
                                                                      When arg0 == null
                                                                        return -1 instead of
API Usage Examples                                                    arg0.toString()
Iterator iter =                                                            Raymond P.L. Buse and Westley R. Weimer.
         SOMETHING.iterator();                                        Automatically documenting program changes. In
                                                                     International Conference on Automated Software
while( iter.hasNext() )                                                         Engineering, Antwerp, Belgium, 2010.
{
   Object o = iter.next();
   //Do something with o
}
Raymond P. L. Buse and Westley Weimer. Synthesizing API Usage Examples. In
International Conference on Software Engineering [To Appear], Zurich, Switzerland, 2012.


                                                                                                             93
API Usage Examples
Iterator iter =
         SOMETHING.iterator();
while( iter.hasNext() )
{
   Object o = iter.next();
   //Do something with o                                                            This Talk
}
Raymond P. L. Buse and Westley Weimer. Synthesizing API Usage Examples. In
International Conference on Software Engineering [To Appear], Zurich, Switzerland, 2012.


                                                                                                94
“The greatest obstacle to learning an
 API … is insufficient or inadequate
              examples”

   Martin Robillard. What Makes APIs Hard to Learn? Answers
   from Developers. IEEE Software, 26(6):27-34, 2009.
Synthesizing API Examples

Research questions:
• What makes a good example?
• Can we create good examples
  automatically?
Sources of Examples




                 JavaDoc




                           97
Search-based examples

BufferedReader reader = new BufferedReader(new
                                      InputStreamReader(page));
try {String line = reader.readLine();
while (line != null) {
if (line.matches(substituteWikiWord(wikiWord,newTopicPattern)))
{

Query: BufferedReader




                                                            98
Hand-Crafted Examples
                                             Example Data

FileOutputStream fos = new FileOutputStream("t.tmp");
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeInt(12345);
oos.writeObject("Today");
oos.writeObject(new Date());
oos.close();
                               Query: java.util.ObjectOutputStream

Complete


                                                               99
Hand-Crafted Examples
                                      Abstract
                                    Initialization

int glyphIndex = ...;
GlyphMetrics metrics =
GlyphVector.getGlyphMetrics(glyphIndex);
int isStandard = metrics.isStandard();
float glyphAdvance = metrics.getAdvance();
Query: java.awt.font.GlyphMetrics




                                                     100
Hand-Crafted Examples
for(char c = iter.first();
     c != CharacterIterator.DONE;
     c = iter.next()) {
   processChar(c);
}                         Hole
Query: java.text.CharacterIterator

try {
file.delete();                                  Hole
} catch (IOException exc) {
  // failed to delete, do error handling here
}
return FileVisitResult.CONTINUE;
Query: java.nio.FileVisitor

                                                  101
Our API Examples


FileReader fReader; //initialized previously
BufferedReader br = new BufferedReader(fReader);
while(br.ready()) {
    String line = br.readLine();
    //do something with line
}
br.close();

Query: java.util.BufferedReader



                                                   102
Our API Examples
                               Common          Abstract
                            variable names   Initialization

FileReader fReader; //initialized previously
BufferedReader br = new BufferedReader(fReader);
while(br.ready()) {
    String line = br.readLine();
    //do something with line
}
br.close();                       “Holes”
Query: java.util.BufferedReader
                       Complete

                                                              103
Synthesis
Approach
• Mine usages from an existing program corpus
  – Similar to Specification Mining
• Learn common patterns
• Abstract representative examples




                                            105
Search Term
Software      “BufferedReader”
 Corpus                                      Control Flow
                                 <init>         Graph
                                            Representation
                                  ready
           Miner


                                 readLine



                                  print



                                  close


                                                         106
argument
                   Search Term              argument
                                                                  “foo.txt”
Software                                    FileReader
              “BufferedReader”
 Corpus
                                 <init>                identifier
                                                          br

                                  ready
           Miner                                   predicate
                                                       ready()

                                 readLine
                                                        returns
                                                       String:
                                                         line
                                  print
           With Annotations                        argument
             Relevant to                                 line
           Documentation          close


                                                                        107
Search Term
Software       “BufferedReader”
 Corpus




           Miner

                                  ...



     Many
  Concrete Uses

                                    108
Multiple Common Patterns
BufferedReader br = new BufferedReader(new FileReader("foo.in"));
while( br.ready() )
{
  String s = br.readLine();
  //Do something with s
}

BufferedReader br = new BufferedReader(new FileReader("foo.in"));
String s;
while( ( s = br.readLine() ) != null )
{
   //Do something with s
}


                                                                    109
Clustering: Group Similar Patterns

Many Concrete
    Uses
                              … Grouped Into
                 Clustering
                              A Few Different
                 Operator        Patterns




                                         110
Clustering
• k-medoids
• Distance metric captures
  difference in order of
  statements and types of
  objects




                             111
Abstraction

  Many
Concrete
Examples
              Abstraction
               Operator
                               … Into One
                            Abstract Example




                                        112
Concrete                        Concrete
if(iter.hasNext()) {             if(iter.hasNext()) {
  set.add( iter.next() );          print( iter.next() );
}                                }




                         Abstraction
 Least-upper-             Operator
 bound types

                if(iter.hasNext()) {
                  Object o = iter.next();       Insert Hole
                  //Do something with o
                }

                                                              113
Concrete
Iterator iter = set.iterator();
...
                                      Concrete
                        Iterator iter = list.iterator();
                        ...



                        Abstraction
                         Operator           Abstract
                                          Initialization

       Iterator iter = SOMETHING.iterator();
       ...


                                                           114
Recap
 Software
  Corpus
              Mine      Cluster   Abstract

Search Term




                                             115
Recap
 Software
  Corpus
              Mine            Cluster    Abstract

Search Term


                     Yield Well-Formed   Code Gen
                        Source Code




                                                    116
Examples


Calendar calendar = Calendar.getInstance();
Date d = calendar.getTime();
//Do something with d
                              Query: java.util.Calendar




                                                      117
Examples

String regex; //initialized previously
String input; //initialized previously
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(input);
//Do something with m
                          Query: java.util.regex.Pattern




                                                       118
Limitations
• Can’t always be perfectly precise
   – E.g., Aliasing, Types
   – Conservative analysis preserves correctness
• Common usage is not always best
   – E.g., poor exception handling
   – Guarantee representative examples
• Not all APIs have indicative patterns
• Some patterns are difficult to find
   – Message passing over network etc.

                                                   119
Evaluation
Name of Class
                            Two examples
                        randomly drawn from
                          {Our Tool, Human-
                          Written, eXoaDoc}




Participant specifies
     preference                         121
Comparison to Code Search




                            122
Comparison to Human-Written




                              123
Use Cases
Warnings for Likely Mistakes




                               125
Auto Completion




                  126
Conclusion
0.0014




               Confusing
Hard to Read


                           128
2006      2007            2008               2009           2010               2011            2012




InfoVis ‘06   ISSTA ‘08               ICSE ‘09             ASE ‘10               OOPSLA ‘11         ICSE ‘12




                          ISSTA ‘08              TSE ‘10             FoSER ‘10                ICSE ‘12


                                                                                                         129
2006   2007            2008               2009           2010   2011   2012




       ISSTA ‘08               ICSE ‘09             ASE ‘10            ICSE ‘12




                   ISSTA ‘08              TSE ‘10


                                                                           130
2006   2007            2008               2009           2010   2011   2012




       ISSTA ‘08               ICSE ‘09             ASE ‘10            ICSE ‘12




                   ISSTA ‘08              TSE ‘10


                                                                           131
Readability
  Raymond P.L. Buse and Westley Weimer. Learning a Metric for Code Readability. IEEE Transactions on
  Software Engineering, 36(4):546–558, 2010.

  Raymond P.L. Buse and Westley Weimer. A Metric for Software Readability. In International Symposium on
  Software Testing and Analysis, pages 121–130, Seattle, WA, USA, 2008. ACM Distinguished Paper Award


Runtime Behavior
  Raymond P.L. Buse and Westley Weimer. The Road Not Taken: Estimating path execution frequency statically.
  In International Conference on Software Engineering, Vancouver, CA, 2009.


Documentation
  Raymond P.L. Buse and Westley Weimer. Synthesizing API Usage Examples. In International
  Conference on Software Engineering [To Appear], Zurich, Switzerland, 2012.

  Raymond P.L. Buse and Westley Weimer. Automatically Documenting Program Changes. In
  International Conference on Automated Software Engineering, Antwerp, Belgium, 2010.

  Raymond P.L. Buse and Westley Weimer. Automatic Documentation Inference for Exceptions. In
  International Symposium on Software Testing and Analysis, Seattle, WA, USA, 2008.

                                              http://arrestedcomputing.com
                                                                                                        132
Thank You!
Questions?
Also ask me about:
• Documenting Exceptions
• Generating Commit Messages
• Conducting Human Studies




                               134
Automatically Describing Program Structure and Behavior (PhD Defense)

More Related Content

What's hot

The Ring programming language version 1.6 book - Part 184 of 189
The Ring programming language version 1.6 book - Part 184 of 189The Ring programming language version 1.6 book - Part 184 of 189
The Ring programming language version 1.6 book - Part 184 of 189Mahmoud Samir Fayed
 
Working effectively with legacy code
Working effectively with legacy codeWorking effectively with legacy code
Working effectively with legacy codeShriKant Vashishtha
 
Cbse question-paper-computer-science-2009
Cbse question-paper-computer-science-2009Cbse question-paper-computer-science-2009
Cbse question-paper-computer-science-2009Deepak Singh
 
Writing good std::future&lt;c++>
Writing good std::future&lt;c++>Writing good std::future&lt;c++>
Writing good std::future&lt;c++>Anton Bikineev
 
Java 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forwardJava 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forwardMario Fusco
 
Дмитрий Верескун «Синтаксический сахар C#»
Дмитрий Верескун «Синтаксический сахар C#»Дмитрий Верескун «Синтаксический сахар C#»
Дмитрий Верескун «Синтаксический сахар C#»SpbDotNet Community
 
Java 8 - CJ
Java 8 - CJJava 8 - CJ
Java 8 - CJSunil OS
 
GE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingGE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingMuthu Vinayagam
 
Антихрупкий TypeScript | Odessa Frontend Meetup #17
Антихрупкий TypeScript | Odessa Frontend Meetup #17Антихрупкий TypeScript | Odessa Frontend Meetup #17
Антихрупкий TypeScript | Odessa Frontend Meetup #17OdessaFrontend
 
Introduction to JQ
Introduction to JQIntroduction to JQ
Introduction to JQKnoldus Inc.
 
jq: JSON - Like a Boss
jq: JSON - Like a Bossjq: JSON - Like a Boss
jq: JSON - Like a BossBob Tiernay
 
FP 201 Unit 2 - Part 3
FP 201 Unit 2 - Part 3FP 201 Unit 2 - Part 3
FP 201 Unit 2 - Part 3rohassanie
 
The Ring programming language version 1.3 book - Part 23 of 88
The Ring programming language version 1.3 book - Part 23 of 88The Ring programming language version 1.3 book - Part 23 of 88
The Ring programming language version 1.3 book - Part 23 of 88Mahmoud Samir Fayed
 

What's hot (20)

The Ring programming language version 1.6 book - Part 184 of 189
The Ring programming language version 1.6 book - Part 184 of 189The Ring programming language version 1.6 book - Part 184 of 189
The Ring programming language version 1.6 book - Part 184 of 189
 
Fp201 unit4
Fp201 unit4Fp201 unit4
Fp201 unit4
 
Working effectively with legacy code
Working effectively with legacy codeWorking effectively with legacy code
Working effectively with legacy code
 
Cbse question-paper-computer-science-2009
Cbse question-paper-computer-science-2009Cbse question-paper-computer-science-2009
Cbse question-paper-computer-science-2009
 
Writing good std::future&lt;c++>
Writing good std::future&lt;c++>Writing good std::future&lt;c++>
Writing good std::future&lt;c++>
 
Java 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forwardJava 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forward
 
Python unit 3 and Unit 4
Python unit 3 and Unit 4Python unit 3 and Unit 4
Python unit 3 and Unit 4
 
Дмитрий Верескун «Синтаксический сахар C#»
Дмитрий Верескун «Синтаксический сахар C#»Дмитрий Верескун «Синтаксический сахар C#»
Дмитрий Верескун «Синтаксический сахар C#»
 
Java 8 - CJ
Java 8 - CJJava 8 - CJ
Java 8 - CJ
 
GE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingGE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python Programming
 
Антихрупкий TypeScript | Odessa Frontend Meetup #17
Антихрупкий TypeScript | Odessa Frontend Meetup #17Антихрупкий TypeScript | Odessa Frontend Meetup #17
Антихрупкий TypeScript | Odessa Frontend Meetup #17
 
Introduction to JQ
Introduction to JQIntroduction to JQ
Introduction to JQ
 
jq: JSON - Like a Boss
jq: JSON - Like a Bossjq: JSON - Like a Boss
jq: JSON - Like a Boss
 
Computer Programming- Lecture 6
Computer Programming- Lecture 6Computer Programming- Lecture 6
Computer Programming- Lecture 6
 
Computer Programming- Lecture 4
Computer Programming- Lecture 4Computer Programming- Lecture 4
Computer Programming- Lecture 4
 
Java Class Design
Java Class DesignJava Class Design
Java Class Design
 
FP 201 Unit 2 - Part 3
FP 201 Unit 2 - Part 3FP 201 Unit 2 - Part 3
FP 201 Unit 2 - Part 3
 
C++ aptitude
C++ aptitudeC++ aptitude
C++ aptitude
 
The Ring programming language version 1.3 book - Part 23 of 88
The Ring programming language version 1.3 book - Part 23 of 88The Ring programming language version 1.3 book - Part 23 of 88
The Ring programming language version 1.3 book - Part 23 of 88
 
Lecture 12: Classes and Files
Lecture 12: Classes and FilesLecture 12: Classes and Files
Lecture 12: Classes and Files
 

Viewers also liked

PhD Proposal talk
PhD Proposal talkPhD Proposal talk
PhD Proposal talkRay Buse
 
The Road Not Taken: Estimating Path Execution Frequency Statically
The Road Not Taken: Estimating Path Execution Frequency StaticallyThe Road Not Taken: Estimating Path Execution Frequency Statically
The Road Not Taken: Estimating Path Execution Frequency StaticallyRay Buse
 
Documentation Inference for Exceptions
Documentation Inference for ExceptionsDocumentation Inference for Exceptions
Documentation Inference for ExceptionsRay Buse
 
Synthesizing API Usage Examples
Synthesizing API Usage Examples Synthesizing API Usage Examples
Synthesizing API Usage Examples Ray Buse
 
MSR End of Internship Talk
MSR End of Internship TalkMSR End of Internship Talk
MSR End of Internship TalkRay Buse
 
A Metric for Code Readability
A Metric for Code ReadabilityA Metric for Code Readability
A Metric for Code ReadabilityRay Buse
 
Analytics for Software Development
Analytics for Software DevelopmentAnalytics for Software Development
Analytics for Software DevelopmentRay Buse
 
Information Needs for Software Development Analytics
Information Needs for Software Development AnalyticsInformation Needs for Software Development Analytics
Information Needs for Software Development AnalyticsRay Buse
 

Viewers also liked (8)

PhD Proposal talk
PhD Proposal talkPhD Proposal talk
PhD Proposal talk
 
The Road Not Taken: Estimating Path Execution Frequency Statically
The Road Not Taken: Estimating Path Execution Frequency StaticallyThe Road Not Taken: Estimating Path Execution Frequency Statically
The Road Not Taken: Estimating Path Execution Frequency Statically
 
Documentation Inference for Exceptions
Documentation Inference for ExceptionsDocumentation Inference for Exceptions
Documentation Inference for Exceptions
 
Synthesizing API Usage Examples
Synthesizing API Usage Examples Synthesizing API Usage Examples
Synthesizing API Usage Examples
 
MSR End of Internship Talk
MSR End of Internship TalkMSR End of Internship Talk
MSR End of Internship Talk
 
A Metric for Code Readability
A Metric for Code ReadabilityA Metric for Code Readability
A Metric for Code Readability
 
Analytics for Software Development
Analytics for Software DevelopmentAnalytics for Software Development
Analytics for Software Development
 
Information Needs for Software Development Analytics
Information Needs for Software Development AnalyticsInformation Needs for Software Development Analytics
Information Needs for Software Development Analytics
 

Similar to Automatically Describing Program Structure and Behavior (PhD Defense)

All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2goMoriyoshi Koizumi
 
Software Visualization - Promises & Perils
Software Visualization - Promises & PerilsSoftware Visualization - Promises & Perils
Software Visualization - Promises & PerilsMichele Lanza
 
Software Visualization 101+
Software Visualization 101+Software Visualization 101+
Software Visualization 101+Michele Lanza
 
C Code and the Art of Obfuscation
C Code and the Art of ObfuscationC Code and the Art of Obfuscation
C Code and the Art of Obfuscationguest9006ab
 
Laziness, trampolines, monoids and other functional amenities: this is not yo...
Laziness, trampolines, monoids and other functional amenities: this is not yo...Laziness, trampolines, monoids and other functional amenities: this is not yo...
Laziness, trampolines, monoids and other functional amenities: this is not yo...Mario Fusco
 
Gentle Introduction to Functional Programming
Gentle Introduction to Functional ProgrammingGentle Introduction to Functional Programming
Gentle Introduction to Functional ProgrammingSaurabh Singh
 
#OOP_D_ITS - 2nd - C++ Getting Started
#OOP_D_ITS - 2nd - C++ Getting Started#OOP_D_ITS - 2nd - C++ Getting Started
#OOP_D_ITS - 2nd - C++ Getting StartedHadziq Fabroyir
 
Task based Programming with OmpSs and its Application
Task based Programming with OmpSs and its ApplicationTask based Programming with OmpSs and its Application
Task based Programming with OmpSs and its ApplicationFacultad de Informática UCM
 
talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013ericupnorth
 
Yoyak ScalaDays 2015
Yoyak ScalaDays 2015Yoyak ScalaDays 2015
Yoyak ScalaDays 2015ihji
 
Intro to c programming
Intro to c programmingIntro to c programming
Intro to c programmingPrabhu Govind
 
Let’s talk about microbenchmarking
Let’s talk about microbenchmarkingLet’s talk about microbenchmarking
Let’s talk about microbenchmarkingAndrey Akinshin
 

Similar to Automatically Describing Program Structure and Behavior (PhD Defense) (20)

All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2go
 
C questions
C questionsC questions
C questions
 
Software Visualization - Promises & Perils
Software Visualization - Promises & PerilsSoftware Visualization - Promises & Perils
Software Visualization - Promises & Perils
 
Software Visualization 101+
Software Visualization 101+Software Visualization 101+
Software Visualization 101+
 
Vcs16
Vcs16Vcs16
Vcs16
 
C Code and the Art of Obfuscation
C Code and the Art of ObfuscationC Code and the Art of Obfuscation
C Code and the Art of Obfuscation
 
Laziness, trampolines, monoids and other functional amenities: this is not yo...
Laziness, trampolines, monoids and other functional amenities: this is not yo...Laziness, trampolines, monoids and other functional amenities: this is not yo...
Laziness, trampolines, monoids and other functional amenities: this is not yo...
 
Permute
PermutePermute
Permute
 
Permute
PermutePermute
Permute
 
Qno 1 (d)
Qno 1 (d)Qno 1 (d)
Qno 1 (d)
 
Gentle Introduction to Functional Programming
Gentle Introduction to Functional ProgrammingGentle Introduction to Functional Programming
Gentle Introduction to Functional Programming
 
C#, What Is Next?
C#, What Is Next?C#, What Is Next?
C#, What Is Next?
 
Boosting Developer Productivity with Clang
Boosting Developer Productivity with ClangBoosting Developer Productivity with Clang
Boosting Developer Productivity with Clang
 
#OOP_D_ITS - 2nd - C++ Getting Started
#OOP_D_ITS - 2nd - C++ Getting Started#OOP_D_ITS - 2nd - C++ Getting Started
#OOP_D_ITS - 2nd - C++ Getting Started
 
Functions
FunctionsFunctions
Functions
 
Task based Programming with OmpSs and its Application
Task based Programming with OmpSs and its ApplicationTask based Programming with OmpSs and its Application
Task based Programming with OmpSs and its Application
 
talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013
 
Yoyak ScalaDays 2015
Yoyak ScalaDays 2015Yoyak ScalaDays 2015
Yoyak ScalaDays 2015
 
Intro to c programming
Intro to c programmingIntro to c programming
Intro to c programming
 
Let’s talk about microbenchmarking
Let’s talk about microbenchmarkingLet’s talk about microbenchmarking
Let’s talk about microbenchmarking
 

Recently uploaded

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Recently uploaded (20)

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

Automatically Describing Program Structure and Behavior (PhD Defense)

  • 1. raybuse PhD Defense 4.17.2012 Automatically Describing Program Structure and Behavior Readability Runtime Behavior Documentation 1
  • 2. Code is Difficult to Understand.
  • 3. Write New Code Change Code Read Code “Understanding code is by far the activity at which professional developers spend most of their time.” Peter Hallam. What Do Programmers Really Do Anyway? Microsoft Developer Network (MSDN) – C# Compiler. Jan 2006 3
  • 4. Requirements Maintenance accounts for about Design 70-90% of the total lifecycle Implementation budget of a software project. Verification Maintenance T. M. Pigoski. Practical Software Maintenance: Best Practices for Managing Your Software Investment. R. C. Seacord, D. Plakosh, and G. A. Lewis. Modernizing Legacy Systems: Software Technologies, 4
  • 5.
  • 6. #include <math.h> #include <sys/time.h> #include <X11/Xlib.h> #include <X11/keysym.h> double L ,o ,P ,_=dt,T,Z,D=1,d, s[999],E,h= 8,I, J,K,w[999],M,m,O ,n[999],j=33e-3,i= 1E3,r,t, u,v ,W,S= 74.5,l=221,X=7.26, a,B,A=32.2,c, F,H; int N,q, C, y,p,U; Window z; char f[52] ; GC k; main(){ Display*e= XOpenDisplay( 0); z=RootWindow(e,0); for (XSetForeground(e,k=XCreateGC (e,z,0,0),BlackPixel(e,0)) ; scanf("%lf%lf%lf",y +n,w+y, y+s)+1; y ++); XSelectInput(e,z= XCreateSimpleWindow(e,z,0,0,400,400, 0,0,WhitePixel(e,0) ),KeyPressMask); for(XMapWindow(e,z); ; T=sin(O)){ struct timeval G={ 0,dt*1e6} ; K= cos(j); N=1e4; M+= H*_; Z=D*K; F+=_*P; r=E*K; W=cos( O); m=K*W; H=K*T; O+=D*_*F/ K+d/K*E*_; B= sin(j); a=B*T*D-E*W; XClearWindow(e,z); t=T*E+ D*B*W; j+=d*_*D-_*F*E; P=W*E*B-T*D; for (o+=(I=D*W+E *T*B,E*d/K *B+v+B/K*F*D)*_; p<y; ){ T=p[s]+i; E=c-p[w]; D=n[p]-L; K=D*m-B*T-H*E; if(p [n]+w[ p]+p[s ]== 0|K <fabs(W=T*r-I*E +D*P) |fabs(D=t *D+Z *T-a *E)> K)N=1e4; else{ q=W/K *4E2+2e2; C= 2E2+4e2/ K *D; N-1E4&& XDrawLine(e ,z,k,N ,U,q,C); N=q; U=C; } ++p; } L+=_* (X*t +P*M+m*l); T=X*X+ l*l+M *M; XDrawString(e,z,k ,20,380,f,17); D=v/l*15; i+=(B *l-M*r -X*Z)*_; for(; XPending(e); u *=CS!=N){ XEvent z; XNextEvent(e ,&z); ++*((N=XLookupKeysym (&z.xkey,0))-IT? N-LT? UP-N?& E:& J:& u: &h); --*( DN -N? N-DT ?N== RT?&u: & W:&h:&J ); } m=15*F/l; c+=(I=M/ l,l*H +I*M+a*X)*_; H =A*r+v*X-F*l+( E=.1+X*4.9/l,t =T*m/32-I*T/24 )/S; K=F*M+( h* 1e4/l-(T+ E*5*T*E)/3e2 )/S-X*d-B*A; a=2.63 /l*d; X+=( d*l-T/S *(.19*E +a *.64+J/1e3 )-M* v +A* Z)*_; l += K *_; W=d; sprintf(f, "%5d %3d" "%7d",p =l /1.7,(C=9E3+ O*57.3)%0550,(int)i); d+=T*(.45-14/l* X-a*130-J* .14)*_/125e2+F*_*v; P=(T*(47 *I-m* 52+E*94 *D-t*.38+u*.21*E) /1e2+W* 179*v)/2312; select(p=0,0,0,0,&G); v-=( W*F-T*(.63*m-I*.086+m*E*19-D*25-.11*u )/107e2)*_; D=cos(o); E=sin(o); } } 6
  • 7. 7
  • 9. What does this print? class Change { public static void main(String[] args) { System.out.println(2.00 – 1.10); } } *Adapted from Josh Bloch, Jeremy Manson 9
  • 10. What does this print? class Change { public static void main(String[] args) { System.out.println(2.00 – 1.10); } } Output: 0.8999999999999999 10
  • 11. What does this print? import java.math.BigDecimal; class Change { public static void main(String[] args) { BigDecimal payment = new BigDecimal(2.00); BigDecimal cost = new BigDecimal(1.10); System.out.println(payment.subtract(cost)); } } 11
  • 12. What does this print? import java.math.BigDecimal; class Change { public static void main(String[] args) { BigDecimal payment = new BigDecimal(2.00); BigDecimal cost = new BigDecimal(1.10); System.out.println(payment.subtract(cost)); } } Output: 0.899999999999999911182158092 99874766109466552734375 12
  • 13. BigDecimal public BigDecimal(double val) Translates a double into a BigDecimal which is the exact decimal representation of the double's binary floating- point value. The scale of the returned BigDecimal is the smallest value such that (10scale × val) is an integer. http://docs.oracle.com/javase/6/docs/api/java/math/BigDecimal.html 13
  • 14. What we should have done import java.math.BigDecimal; class Change { public static void main(String[] args) { BigDecimal payment = new BigDecimal(“2.00”); BigDecimal cost = new BigDecimal(“1.10”); System.out.println(payment.subtract(cost)); } } Output: 0.90 14
  • 16. The Rest of this Talk Modeling Code Readability Predicting Runtime Behavior Synthesizing Documentation 16
  • 18. How can we tell if code is readable?
  • 19. 19
  • 20. “… I do not like them in a box. I do not like with a fox. "There once lived, in a sequestered I do not like them in a part of the country of Devonshire, one Mr. Godfrey Nickleby: a worthy house. gentleman, who, taking it into his I do not like them with a head rather late in life that he must mouse. get married, and not being young enough or rich enough to aspire to the …” hand of a lady of fortune, had wedded an old flame out of mere attachment, who in her turn had taken him for the same reason." 20
  • 21. 21
  • 25. Flesch-Kincaid Readability 10.0 15.3 0.0 DOD MIL-M-38784B 25
  • 26. Flesch-Kincaid Readability 10.0 12.1 15.3 0.0 DOD MIL-M-38784B 26
  • 27. Can this work for code? Research questions: • To what extent do humans agree on what code is readable? • Can we derive a accurate descriptive model for readability? • Does the model correlate significantly with software quality?
  • 28. 28
  • 29. Vertical bands imply agreement More Readable Less Readable 29
  • 30.
  • 32. Quantifying Agreement Correlation Statistics • Pearson’s r – linear dependence • Spearman’s ρ – monotonic dependence • Kendall’s τ – counts bubble sort operations • Cohen’s κ – nominal agreement 32
  • 33. Quantifying Agreement Correlation Statistics • Pearson’s r – linear dependence • Spearman’s ρ – monotonic dependence • Kendall’s τ – counts bubble sort operations Absolute agreement • Cohen’s κ – nominal agreement less important than relative agreement 33
  • 34. Quantifying Agreement Perfect Absolute Agreement ρ=1 34
  • 35. Quantifying Agreement Perfect Relative Agreement ρ=1 35
  • 36. Quantifying Agreement Absolute Disagreement ρ = -1 36
  • 37. Quantifying Agreement Random Agreement ρ=0 37
  • 38. Quantifying Agreement “Strong” Agreement ρ > 0.5 38
  • 39. Quantifying Agreement With a Group … 39
  • 40. Quantifying Agreement With a Group … Apples and Oranges: An Empirical Comparison of Commonly Used Indices of Interrater Agreement Allan P. Jones, Lee A. Johnson, Mark C. Butler and Deborah S. Main The Academy of Management Journal , Vol. 26, No. 3 (Sep., 1983), pp. 507-519 40
  • 41. Quantifying Agreement With a Group Compute all pairwise … correlations ρ0 ρ1 ρ2 ρ3 … ρn-1 41
  • 42. Quantifying Agreement With a Group … Return Average ρ ρ = Σ( ρ0 ρ1 ρ2 ρ3 … ρn-1 ) / n 42
  • 43. 0.9 0.8 Spearman correlation between annotators 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 20 40 60 80 100 120 Annotators sorted by agreement 43
  • 45. Flesch-Kincaid Readability 𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡 𝑠𝑦𝑙𝑙𝑎𝑏𝑙𝑒𝐶𝑜𝑢𝑛𝑡 0.39 + 11.8 − 15.59 𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐶𝑜𝑢𝑛𝑡 𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡 45
  • 46. Flesch-Kincaid Readability 𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡 𝑠𝑦𝑙𝑙𝑎𝑏𝑙𝑒𝐶𝑜𝑢𝑛𝑡 0.39 + 11.8 − 15.59 𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐶𝑜𝑢𝑛𝑡 𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡 f 𝑥 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽 𝑛 𝑥 𝑛 46
  • 47. Flesch-Kincaid Readability 𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡 𝑠𝑦𝑙𝑙𝑎𝑏𝑙𝑒𝐶𝑜𝑢𝑛𝑡 0.39 + 11.8 − 15.59 𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐶𝑜𝑢𝑛𝑡 𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡 f 𝑥 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽 𝑛 𝑥 𝑛 Features 47
  • 48. Flesch-Kincaid Readability 𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡 𝑠𝑦𝑙𝑙𝑎𝑏𝑙𝑒𝐶𝑜𝑢𝑛𝑡 0.39 + 11.8 − 15.59 𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐶𝑜𝑢𝑛𝑡 𝑤𝑜𝑟𝑑𝐶𝑜𝑢𝑛𝑡 f 𝑥 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽 𝑛 𝑥 𝑛 Features Weights 48
  • 49. Features Weights Creativity / Intuition Supervised Learning • Regression • Bayesian • Neural Net • SVM • … Use training data from human study 49
  • 50. Potential Code Readability Features class Change { //Computes 2.00 – 1.10 public static void main(String[] args) { BigDecimal payment = new BigDecimal(“2.00”); BigDecimal cost = new BigDecimal(“1.10”); System.out.println(payment.subtract(cost)); } } 50
  • 51. Potential Code Readability Features class Change { //Computes 2.00 – 1.10 public static void main(String[] args) { BigDecimal payment = new BigDecimal(“2.00”); BigDecimal cost = new BigDecimal(“1.10”); System.out.println(payment.subtract(cost)); } } Line Length 51
  • 52. Potential Code Readability Features class Change { Comments //Computes 2.00 – 1.10 public static void main(String[] args) { BigDecimal payment = new BigDecimal(“2.00”); BigDecimal cost = new BigDecimal(“1.10”); System.out.println(payment.subtract(cost)); } } 52
  • 53. Potential Code Readability Features Identifier class Change { Length //Computes 2.00 – 1.10 public static void main(String[] args) { BigDecimal payment = new BigDecimal(“2.00”); BigDecimal cost = new BigDecimal(“1.10”); System.out.println(payment.subtract(cost)); } } 53
  • 54. Potential Code Readability Features class Change { Blank Lines //Computes 2.00 – 1.10 public static void main(String[] args) { BigDecimal payment = new BigDecimal(“2.00”); BigDecimal cost = new BigDecimal(“1.10”); System.out.println(payment.subtract(cost)); } } 54
  • 56. Model agrees with humans as much as they agree with each other 0.9 0.8 Spearman correlation between annotators 0.7 our metric 0.6 0.5 average human 0.4 0.3 0.2 0.1 0 0 20 40 60 80 100 120 Human Annotators (sorted) 56
  • 57. 57
  • 58. Line Length, # of identifiers is important 58
  • 59. Length of identifiers is not Line Length, # of identifiers is important 59
  • 61. Benchmarks 12 Open Source Sourceforge Projects, Over 2M LOC 61
  • 62. Mature projects tend to be more readable alpha beta stable mature 62
  • 63. 63
  • 64. JUnit 3.0 -> 4.0 64
  • 65. Readability Metric • Metric (and source code) is freely available: arrestedcomputing.com/readability • Has been used directly in several published papers. 65
  • 67. 0.0014 0.82 Confusing Hard to Read 67
  • 70. Approaches to Predicting Behavior Dynamic Profiles Static Heuristics • Precise - full • Cheap program path • Only need program profiles code • Requires indicative • Typically limited in workloads scope 70
  • 71.
  • 72. Static Heuristics D. W. Wall. Predicting program behavior using real or estimated profiles. In ACM Conf. on Programming Language Design and Implementation (PLDI'91), pages 59-70, June 1991. T. Ball and J. R. Larus. Branch prediction for free. In ACM Conf. on Programming Language Design and Implementation (PLDI'93), pages 300-313, June 1993. Boogerd, C. and Moonen, L. Prioritizing Software Inspection Results using Static Profiling, Source Code Analysis and Manipulation, 2006. SCAM '06. pp.149-160, Sept. 2006 72
  • 73. Intra-class static path profiles • Precision similar to a dynamic profiler • Workloads not required HashTable put() rehash() get() 73
  • 75. public V put(K key , V value) { if ( value == null ) throw new Exception(); if ( count >= threshold ) rehash(); index = key.hashCode() % length; table[index] = new Entry(key, value); count++; return value; } *from java.util.HashTable jdk6.0 75
  • 76. public V put(K key , V value) { if ( value == null ) Exception throw new Exception(); if ( count >= threshold ) rehash(); Invocation that changes a lot of index = key.hashCode() % length; program state. table[index] = new Entry(key, value); count++; return value; Computation } *from java.util.HashTable jdk6.0 76
  • 77. Static Path Profiling Research questions: • What static code features are predictive of path execution frequency? • Can we derive a accurate descriptive model for runtime behavior?
  • 78. Approach • Build a descriptive model of path execution frequency • Features: length of path, presence of exceptions, number of variables written … • Train and cross-validate on SPEC Java benchmarks 78
  • 79. Choose 5% of all paths and get 50% of runtime behavior Ranking by our metric Baseline: random ranking 79
  • 80. Ranking by our metric Choose 1 path per method and get 94% of runtime behavior Baseline: random ranking 80
  • 81. Applications for Profiles • Profile guided optimization • Complexity/Runtime estimation • Anomaly detection • Significance of difference between program versions • Prioritizing output from other analyses • Documentation 81
  • 82. Conclusion • A formal model that statically predicts relative dynamic path execution frequencies • The promise of helping other program analyses and transformations 82
  • 85. Classic Approaches to Understandability Improving Compensating • Code Reviews • Testing • Training • Verification • Languages • Other Program • Documentation Analyses 85
  • 86. Classic Approaches to Understandability Improving Compensating • Code Reviews • Testing • Training • Verification • Languages • Other Program Analyses • Documentation Doesn't solve underlying problem Labor Intensive 86
  • 88. Improving • Code Reviews • Training • Languages • Documentation To do this. Use these tools… 88
  • 89. The most significant barrier to code reuse is “software is too difficult to understand or is poorly documented.” NASA Software Reuse Working Group. Software Reuse Survey. http://www.esdswg.com/softwarereuse/Resources/library/working_group_documen ts/survey2005
  • 90. Source Code Documentation • Describes some important aspect of the code in a way that’s easier to understand. • Explanations/Summaries of Behavior • Pre/Post Conditions, Caveats • Usage Examples • … 90
  • 91. Automatic Documentation Synthesis Tool Program Code Documentation 91
  • 92. Automatic Documentation • Cheap • Always up-to-date • Complete • Well-defined trust properties • Structured (Searchable) 92
  • 93. Exceptions IllegalStateException thrown when getLocation() is not Europe Raymond P. L. Buse and Westley R.Weimer. Automatic Documentation Inference for Exceptions. In International Symposium on Software Program Changes Testing and Analysis, Seattle, WA, USA, 2008. When arg0 == null return -1 instead of API Usage Examples arg0.toString() Iterator iter = Raymond P.L. Buse and Westley R. Weimer. SOMETHING.iterator(); Automatically documenting program changes. In International Conference on Automated Software while( iter.hasNext() ) Engineering, Antwerp, Belgium, 2010. { Object o = iter.next(); //Do something with o } Raymond P. L. Buse and Westley Weimer. Synthesizing API Usage Examples. In International Conference on Software Engineering [To Appear], Zurich, Switzerland, 2012. 93
  • 94. API Usage Examples Iterator iter = SOMETHING.iterator(); while( iter.hasNext() ) { Object o = iter.next(); //Do something with o This Talk } Raymond P. L. Buse and Westley Weimer. Synthesizing API Usage Examples. In International Conference on Software Engineering [To Appear], Zurich, Switzerland, 2012. 94
  • 95. “The greatest obstacle to learning an API … is insufficient or inadequate examples” Martin Robillard. What Makes APIs Hard to Learn? Answers from Developers. IEEE Software, 26(6):27-34, 2009.
  • 96. Synthesizing API Examples Research questions: • What makes a good example? • Can we create good examples automatically?
  • 97. Sources of Examples JavaDoc 97
  • 98. Search-based examples BufferedReader reader = new BufferedReader(new InputStreamReader(page)); try {String line = reader.readLine(); while (line != null) { if (line.matches(substituteWikiWord(wikiWord,newTopicPattern))) { Query: BufferedReader 98
  • 99. Hand-Crafted Examples Example Data FileOutputStream fos = new FileOutputStream("t.tmp"); ObjectOutputStream oos = new ObjectOutputStream(fos); oos.writeInt(12345); oos.writeObject("Today"); oos.writeObject(new Date()); oos.close(); Query: java.util.ObjectOutputStream Complete 99
  • 100. Hand-Crafted Examples Abstract Initialization int glyphIndex = ...; GlyphMetrics metrics = GlyphVector.getGlyphMetrics(glyphIndex); int isStandard = metrics.isStandard(); float glyphAdvance = metrics.getAdvance(); Query: java.awt.font.GlyphMetrics 100
  • 101. Hand-Crafted Examples for(char c = iter.first(); c != CharacterIterator.DONE; c = iter.next()) { processChar(c); } Hole Query: java.text.CharacterIterator try { file.delete(); Hole } catch (IOException exc) { // failed to delete, do error handling here } return FileVisitResult.CONTINUE; Query: java.nio.FileVisitor 101
  • 102. Our API Examples FileReader fReader; //initialized previously BufferedReader br = new BufferedReader(fReader); while(br.ready()) { String line = br.readLine(); //do something with line } br.close(); Query: java.util.BufferedReader 102
  • 103. Our API Examples Common Abstract variable names Initialization FileReader fReader; //initialized previously BufferedReader br = new BufferedReader(fReader); while(br.ready()) { String line = br.readLine(); //do something with line } br.close(); “Holes” Query: java.util.BufferedReader Complete 103
  • 105. Approach • Mine usages from an existing program corpus – Similar to Specification Mining • Learn common patterns • Abstract representative examples 105
  • 106. Search Term Software “BufferedReader” Corpus Control Flow <init> Graph Representation ready Miner readLine print close 106
  • 107. argument Search Term argument “foo.txt” Software FileReader “BufferedReader” Corpus <init> identifier br ready Miner predicate ready() readLine returns String: line print With Annotations argument Relevant to line Documentation close 107
  • 108. Search Term Software “BufferedReader” Corpus Miner ... Many Concrete Uses 108
  • 109. Multiple Common Patterns BufferedReader br = new BufferedReader(new FileReader("foo.in")); while( br.ready() ) { String s = br.readLine(); //Do something with s } BufferedReader br = new BufferedReader(new FileReader("foo.in")); String s; while( ( s = br.readLine() ) != null ) { //Do something with s } 109
  • 110. Clustering: Group Similar Patterns Many Concrete Uses … Grouped Into Clustering A Few Different Operator Patterns 110
  • 111. Clustering • k-medoids • Distance metric captures difference in order of statements and types of objects 111
  • 112. Abstraction Many Concrete Examples Abstraction Operator … Into One Abstract Example 112
  • 113. Concrete Concrete if(iter.hasNext()) { if(iter.hasNext()) { set.add( iter.next() ); print( iter.next() ); } } Abstraction Least-upper- Operator bound types if(iter.hasNext()) { Object o = iter.next(); Insert Hole //Do something with o } 113
  • 114. Concrete Iterator iter = set.iterator(); ... Concrete Iterator iter = list.iterator(); ... Abstraction Operator Abstract Initialization Iterator iter = SOMETHING.iterator(); ... 114
  • 115. Recap Software Corpus Mine Cluster Abstract Search Term 115
  • 116. Recap Software Corpus Mine Cluster Abstract Search Term Yield Well-Formed Code Gen Source Code 116
  • 117. Examples Calendar calendar = Calendar.getInstance(); Date d = calendar.getTime(); //Do something with d Query: java.util.Calendar 117
  • 118. Examples String regex; //initialized previously String input; //initialized previously Pattern pattern = Pattern.compile(regex); Matcher m = pattern.matcher(input); //Do something with m Query: java.util.regex.Pattern 118
  • 119. Limitations • Can’t always be perfectly precise – E.g., Aliasing, Types – Conservative analysis preserves correctness • Common usage is not always best – E.g., poor exception handling – Guarantee representative examples • Not all APIs have indicative patterns • Some patterns are difficult to find – Message passing over network etc. 119
  • 121. Name of Class Two examples randomly drawn from {Our Tool, Human- Written, eXoaDoc} Participant specifies preference 121
  • 122. Comparison to Code Search 122
  • 125. Warnings for Likely Mistakes 125
  • 128. 0.0014 Confusing Hard to Read 128
  • 129. 2006 2007 2008 2009 2010 2011 2012 InfoVis ‘06 ISSTA ‘08 ICSE ‘09 ASE ‘10 OOPSLA ‘11 ICSE ‘12 ISSTA ‘08 TSE ‘10 FoSER ‘10 ICSE ‘12 129
  • 130. 2006 2007 2008 2009 2010 2011 2012 ISSTA ‘08 ICSE ‘09 ASE ‘10 ICSE ‘12 ISSTA ‘08 TSE ‘10 130
  • 131. 2006 2007 2008 2009 2010 2011 2012 ISSTA ‘08 ICSE ‘09 ASE ‘10 ICSE ‘12 ISSTA ‘08 TSE ‘10 131
  • 132. Readability Raymond P.L. Buse and Westley Weimer. Learning a Metric for Code Readability. IEEE Transactions on Software Engineering, 36(4):546–558, 2010. Raymond P.L. Buse and Westley Weimer. A Metric for Software Readability. In International Symposium on Software Testing and Analysis, pages 121–130, Seattle, WA, USA, 2008. ACM Distinguished Paper Award Runtime Behavior Raymond P.L. Buse and Westley Weimer. The Road Not Taken: Estimating path execution frequency statically. In International Conference on Software Engineering, Vancouver, CA, 2009. Documentation Raymond P.L. Buse and Westley Weimer. Synthesizing API Usage Examples. In International Conference on Software Engineering [To Appear], Zurich, Switzerland, 2012. Raymond P.L. Buse and Westley Weimer. Automatically Documenting Program Changes. In International Conference on Automated Software Engineering, Antwerp, Belgium, 2010. Raymond P.L. Buse and Westley Weimer. Automatic Documentation Inference for Exceptions. In International Symposium on Software Testing and Analysis, Seattle, WA, USA, 2008. http://arrestedcomputing.com 132
  • 134. Questions? Also ask me about: • Documenting Exceptions • Generating Commit Messages • Conducting Human Studies 134