Learning from
6,000 Projects
Mining Models in the Large


       Andreas Zeller
     Saarland University
Saarbrücken
Saarbrücken
Saarbrücken
Saarbrücken
Saarbrücken
Saarbrücken
Saarbrücken




®
    Visual
    Computing
    Institute
Saarbrücken
Some numbers
Some numbers
•   ~70 PhD advisors in computer science
Some numbers
•   ~70 PhD advisors in computer science
•   ≥ 300 PhD students in computer science
Some numbers
•   ~70 PhD advisors in computer science
•   ≥ 300 PhD students in computer science
•   ~60 new PhD graduates per year
Some numbers
•   ~70 PhD advisors in computer science
•   ≥ 300 PhD students in computer science
•   ~60 new PhD graduates per year
•   ~60 new MSc graduates per year
Some numbers
•   ~70 PhD advisors in computer science
•   ≥ 300 PhD students in computer science
•   ~60 new PhD graduates per year
•   ~60 new MSc graduates per year
•   800–1400 € per month as a PhD stipend
    (+ laptop & office • starting right after BSc • all courses in English)
Two Graduates




Michael Backes   Andrej Rybalchenko
 TR35 in 2009       TR35 in 2010
Michael Backes   Andrej Rybalchenko
secure protocols   Andrej Rybalchenko
secure protocols   loop termination
secure protocols        loop termination

             hard to verify
secure protocols          loop termination




              hard to verify
information ow

secure protocols          loop termination




              hard to verify
information ow                 liveness

secure protocols          loop termination




              hard to verify
buffer over ow

information ow                 liveness

secure protocols          loop termination




              hard to verify
buffer over ow              resource leaks

information ow                 liveness

secure protocols          loop termination




              hard to verify
buffer over ow              resource leaks

information ow                 liveness

secure protocols         loop termination




             easy to specify
             hard to verify
hard to specify
sorting




hard to specify
∀i ∈ {0, . . . , |x |} : x [i] < x [i + 1]
            |x| = |x |
 ∀i ∈ {0, . . . , |x|} : ιi ∈ {0, . . . , |x |} : x[i] = x [i ]
∀i ∈ {0, . . . , |x |} : ιi ∈ {0, . . . , |x|} : x [i ] = x[i]




                      hard to specify
∀i ∈ {0, . . . , |x |} : x [i] < x [i + 1]
            |x| = |x |
 ∀i ∈ {0, . . . , |x|} : ιi ∈ {0, . . . , |x |} : x[i] = x [i ]
∀i ∈ {0, . . . , |x |} : ιi ∈ {0, . . . , |x|} : x [i ] = x[i]



                      easy to verify
                      hard to specify
is-sorted(x ) ∧ is-permutation(x, x )




           still hard to specify
microsoft word
microsoft word

travel booking
microsoft word

travel booking

airplane control
microsoft word     mobile phones

travel booking

airplane control
microsoft word      mobile phones

travel booking     operating systems

airplane control
microsoft word      mobile phones

travel booking     operating systems

airplane control   banking systems
microsoft word              mobile phones

travel booking            operating systems

airplane control           banking systems




               hard to specify
microsoft word              mobile phones

travel booking            operating systems

airplane control           banking systems




                easy to verify
               hard to specify
hard to specify
hard to specify




new language • duplicate effort • can’t abstract from details
speci cation crisis
mine speci cations
mine speci cations
mine speci cations
from 6,000 projects
Speci cations

∀i ∈ {0, . . . , |x |} : x [i] < x [i + 1]
            |x| = |x |
 ∀i ∈ {0, . . . , |x|} : ιi ∈ {0, . . . , |x |} : x[i] = x [i ]
∀i ∈ {0, . . . , |x |} : ιi ∈ {0, . . . , |x|} : x [i ] = x[i]



              pre- and postconditions
Speci cations
               auth()!
<init>()
                                   openPort()
               socket: null                       socket: ¬null
           state: NOT_CON                         state: PLAIN


            quit()                              auth()
                              socket: ¬null
                              state: AUTH




                     nite state models
OP-Miner
OP-Miner

Program
OP-Miner
                 Usage Models



Program   iter.hasNext ()   iter.next ()
OP-Miner
                 Usage Models              Temporal Properties

                                           hasNext ≺ next
Program                                    hasNext ≺ hasNext
          iter.hasNext ()   iter.next ()   next ≺ hasNext
                                           next ≺ next
OP-Miner
                 Usage Models                      Temporal Properties

                                                    hasNext ≺ next
Program                                             hasNext ≺ hasNext
          iter.hasNext ()   iter.next ()            next ≺ hasNext
                                                    next ≺ next




                                               Patterns


                                           hasNext ≺ next
                                           hasNext ≺ hasNext
OP-Miner
                       Usage Models                      Temporal Properties

                                                          hasNext ≺ next
Program                                                   hasNext ≺ hasNext
               iter.hasNext ()    iter.next ()            next ≺ hasNext
                                                          next ≺ next




                Anomalies                            Patterns

              hasNext ≺ next
          ✓   hasNext ≺ hasNext                  hasNext ≺ next
              hasNext ≺ next                     hasNext ≺ hasNext
          ✗   hasNext ≺ hasNext
OP-Miner
                       Usage Models                      Temporal Properties

                                                          hasNext ≺ next
Program                                                   hasNext ≺ hasNext
               iter.hasNext ()    iter.next ()            next ≺ hasNext
                                                          next ≺ next




                Anomalies                            Patterns

              hasNext ≺ next
          ✓   hasNext ≺ hasNext                  hasNext ≺ next
              hasNext ≺ next                     hasNext ≺ hasNext
          ✗   hasNext ≺ hasNext
public Stack createStack () {
  Random r = new Random ();
  int n = r.nextInt ();
  Stack s = new Stack ();
  int i = 0;
  while (i < n) {
    s.push (rand (r));
    i++;
  }
  s.push (-1);
  return s;
}
public Stack createStack () {
  Random r = new Random ();
  int n = r.nextInt ();
  Stack s = new Stack ();
  int i = 0;
  while (i < n) {
    s.push (rand (r));
    i++;
  }
  s.push (-1);
  return s;
}
Random r = new Random ();
public Stack createStack () {
  Random r = new Random ();
  int n = r.nextInt ();
  Stack s = new Stack ();
  int i = 0;
  while (i < n) {
    s.push (rand (r));
    i++;
  }
  s.push (-1);
  return s;
}
Random r = new Random ();
public Stack createStack () {
  Random r = new Random ();
  int n = r.nextInt ();         int n = r.nextInt ();
  Stack s = new Stack ();
  int i = 0;                    Stack s = new Stack ();
  while (i < n) {
    s.push (rand (r));
    i++;                        int i = 0;
  }
  s.push (-1);
  return s;
}
Random r = new Random ();
public Stack createStack () {
  Random r = new Random ();
  int n = r.nextInt ();         int n = r.nextInt ();
  Stack s = new Stack ();
  int i = 0;                    Stack s = new Stack ();
  while (i < n) {
    s.push (rand (r));
    i++;                        int i = 0;
  }
  s.push (-1);                  i < n
  return s;                                          i++;
}
                                s.push (rand (r));
Random r = new Random ();
public Stack createStack () {
  Random r = new Random ();
  int n = r.nextInt ();               int n = r.nextInt ();
  Stack s = new Stack ();
  int i = 0;                          Stack s = new Stack ();
  while (i < n) {
    s.push (rand (r));
    i++;                              int i = 0;
  }
  s.push (-1);                i < n   i < n
  return s;                                                i++;
}
                     s.push (-1);     s.push (rand (r));
Random r = new Random ();
public Stack createStack () {
  Random r = new Random ();
  int n = r.nextInt ();               int n = r.nextInt ();
  Stack s = new Stack ();
  int i = 0;                          Stack s = new Stack ();
  while (i < n) {
    s.push (rand (r));
    i++;                              int i = 0;
  }
  s.push (-1);                i < n   i < n
  return s;                                                i++;
}
                     s.push (-1);     s.push (rand (r));
Random r = new Random ();


                int n = r.nextInt ();


                Stack s = new Stack ();


                int i = 0;


        i < n   i < n
                                     i++;
s.push (-1);    s.push (rand (r));
Stack s = new Stack ();




s.push (-1);   s.push (rand (r));
s.<init>()

  s.push (_)

s.push (_)
Random r = new Random ();


                int n = r.nextInt ();


                Stack s = new Stack ();


                int i = 0;


        i < n   i < n
                                     i++;
s.push (-1);    s.push (rand (r));
Random r = new Random ();


int n = r.nextInt ();




s.push (rand (r));
r.<init> ()



r.nextInt ()

  Utils.rand (r)
OP-Miner
                       Usage Models                      Temporal Properties

                                                          hasNext ≺ next
Program                                                   hasNext ≺ hasNext
               iter.hasNext ()    iter.next ()            next ≺ hasNext
                                                          next ≺ next




                Anomalies                            Patterns

              hasNext ≺ next
          ✓   hasNext ≺ hasNext                  hasNext ≺ next
              hasNext ≺ next                     hasNext ≺ hasNext
          ✗   hasNext ≺ hasNext
OP-Miner
                       Usage Models                      Temporal Properties

                                                          hasNext ≺ next
Program                                                   hasNext ≺ hasNext
               iter.hasNext ()    iter.next ()            next ≺ hasNext
                                                          next ≺ next




                Anomalies                            Patterns

              hasNext ≺ next
          ✓   hasNext ≺ hasNext                  hasNext ≺ next
              hasNext ≺ next                     hasNext ≺ hasNext
          ✗   hasNext ≺ hasNext
Methods vs. Properties
               Temporal Properties
          start ≺    lock ≺      eof ≺
           stop      unlock      close
Methods
Methods vs. Properties
                       Temporal Properties
                  start ≺    lock ≺      eof ≺
                   stop      unlock      close


          get()
Methods
Methods vs. Properties
                        Temporal Properties
                   start ≺    lock ≺      eof ≺
                    stop      unlock      close


          get()
Methods




          open()
Methods vs. Properties
                         Temporal Properties
                    start ≺    lock ≺      eof ≺
                     stop      unlock      close


           get()
Methods




          open()


          hello()
Methods vs. Properties
                         Temporal Properties
                    start ≺    lock ≺      eof ≺
                     stop      unlock      close


           get()
Methods




          open()


          hello()


          parse()
Methods vs. Properties
                         Temporal Properties
                    start ≺    lock ≺      eof ≺
                     stop      unlock      close


           get()
Methods




          open()


          hello()


          parse()
Methods vs. Properties
                         Temporal Properties
                    start ≺    lock ≺      eof ≺
                     stop      unlock      close


           get()
Methods




          open()


          hello()


          parse()
Methods vs. Properties
                         Temporal Properties
                    start ≺    lock ≺      eof ≺
                     stop      unlock      close
                                                   Pattern
           get()
Methods




          open()


          hello()


          parse()
Methods vs. Properties
                         Temporal Properties
                    start ≺    lock ≺      eof ≺
                     stop      unlock      close
                                                         Pattern
           get()
Methods




          open()


          hello()


          parse()
                                               Support
Discovering Anomalies
                         Temporal Properties
                    start ≺    lock ≺      eof ≺
                     stop      unlock      close


           get()
Methods




          open()


          hello()


          parse()
Discovering Anomalies
                         Temporal Properties
                    start ≺    lock ≺      eof ≺
                     stop      unlock      close
                                                   Anomaly
           get()                 ✘
Methods




          open()


          hello()


          parse()
AspectJ
for (Iterator iter = itdFields.iterator();
    iter.hasNext();) {
   ...
   for (Iterator iter2 = worthRetrying.iterator();
        iter.hasNext();) {
       ...
   }
}
for (Iterator iter = itdFields.iterator();
    iter.hasNext();) {
   ...
   for (Iterator iter2 = worthRetrying.iterator();
        iter.hasNext();) {
       ...       should be iter2
   }
}
public void visitNEWARRAY (NEWARRAY o) {
  byte t = o.getTypecode ();
  if (!((t == Constants.T_BOOLEAN) ||
        (t == Constants.T_CHAR) ||
         ...

         (t == Constants.T_LONG))) {
     constraintViolated (o, "(...) '+t+' (...)");

 }
}
public void visitNEWARRAY (NEWARRAY o) {
  byte t = o.getTypecode ();
  if (!((t == Constants.T_BOOLEAN) ||
        (t == Constants.T_CHAR) ||
         ...

         (t == Constants.T_LONG))) {
     constraintViolated (o, "(...) '+t+' (...)");

 }
}                                     should be double quotes
Name internalNewName (String[] identifiers)
  ...

 for (int i = 1; i < count; i++) {

     SimpleName name = new SimpleName(this);

     name.internalSetIdentifier(identifiers[i]);

     ...

 }
  ...
}
Name internalNewName (String[] identifiers)
  ...

 for (int i = 1; i < count; i++) {

     SimpleName name = new SimpleName(this);

     name.internalSetIdentifier(identifiers[i]);

     ...

 }                                      should stay as is
  ...
}
public String getRetentionPolicy ()
{
  ...
  for (Iterator it = ...; it.hasNext();)
  {
      ... = it.next();
      ...
      return retentionPolicy;
  }
  ...
}
public String getRetentionPolicy ()
{
  ...
  for (Iterator it = ...; it.hasNext();)
  {
      ... = it.next();
      ...
      return retentionPolicy;
  }
  ...                                 should be   xed
}
44% of violations
are defects or code smells
mine speci cations
mine speci cations
across thousands of projects
Wisdom of the crowds




                Francis
                Galton
                Nein, links auch nicht
Wisdom of the crowds




                Francis
                Galton
                Nein, links auch nicht
lightweight parsing
Target Languages
Java     C++   C   PHP   Javascript
Target Languages
  Java       C++   C       PHP   Javascript


Similar syntax
           {...}       ;         foo()
Target Languages
  Java        C++        C        PHP    Javascript


Similar syntax
           {...}             ;           foo()


Similar keywords
      while         if       switch     return
Lightweight Parser

                 Abstract      Temporal
Source Code
              Representation   Properties
Lightweight Parser

                     Abstract      Temporal
Source Code
                  Representation   Properties
       }
       language-independent
         lightweight parsing
Abstract      Temporal
Source Code
              Representation   Properties
Abstract      Temporal
         Source Code
                       Representation   Properties



int j;
int fA;
int fB = open(“newFile”);
fA = open(“myFile”);
j = 7;
while (j > 3) {
   read(fA);
   write(fB, “Hello”);
   j--;
}

close(fA);
close(fB);
Abstract          Temporal
         Source Code
                       Representation       Properties



int j;                                    fB: open(CONST)
int fA;
int fB = open(“newFile”);
                                         fA: open(CONST)
fA = open(“myFile”);
j = 7;
while (j > 3) {                         Loop:
   read(fA);                                  read(fA)
   write(fB, “Hello”);                   write(fB, CONST)
   j--;
}
                                                close(fA)
close(fA);
close(fB);                                      close(fB)
Abstract      Temporal
            Source Code
                          Representation   Properties



  fB: open(CONST)

 fA: open(CONST)

Loop:
      read(fA)
 write(fB, CONST)

        close(fA)

        close(fB)
Abstract        Temporal
            Source Code
                          Representation     Properties


                           fA: open(CONST)
  fB: open(CONST)
                               read(fA)
 fA: open(CONST)
                              close(fA)


Loop:
      read(fA)
 write(fB, CONST)

        close(fA)

        close(fB)
Abstract         Temporal
            Source Code
                          Representation      Properties


                           fA: open(CONST)
  fB: open(CONST)
                               read(fA)
 fA: open(CONST)
                               close(fA)


Loop:
      read(fA)
 write(fB, CONST)

                           fB: open(CONST)
        close(fA)
                           write(fB, CONST)

        close(fB)              close(fB)
Abstract         Temporal
            Source Code
                          Representation      Properties


                           fA: open(CONST)
  fB: open(CONST)                                open() < read()
                               read(fA)
 fA: open(CONST)
                               close(fA)


Loop:
      read(fA)
 write(fB, CONST)

                           fB: open(CONST)
        close(fA)
                           write(fB, CONST)

        close(fB)              close(fB)
Abstract         Temporal
            Source Code
                          Representation      Properties


                           fA: open(CONST)
  fB: open(CONST)                                open() < read()
                                                 open() < close()
                               read(fA)
 fA: open(CONST)
                               close(fA)


Loop:
      read(fA)
 write(fB, CONST)

                           fB: open(CONST)
        close(fA)
                           write(fB, CONST)

        close(fB)              close(fB)
Abstract         Temporal
            Source Code
                          Representation      Properties


                           fA: open(CONST)
  fB: open(CONST)                                open() < read()
                                                 open() < close()
                               read(fA)
                                                 read() < read()
 fA: open(CONST)
                               close(fA)


Loop:
      read(fA)
 write(fB, CONST)

                           fB: open(CONST)
        close(fA)
                           write(fB, CONST)

        close(fB)              close(fB)
Abstract         Temporal
            Source Code
                          Representation      Properties


                           fA: open(CONST)
  fB: open(CONST)                                open() < read()
                                                 open() < close()
                               read(fA)
                                                 read() < read()
 fA: open(CONST)
                               close(fA)         read() < close()

Loop:
      read(fA)
 write(fB, CONST)

                           fB: open(CONST)
        close(fA)
                           write(fB, CONST)

        close(fB)              close(fB)
Abstract         Temporal
            Source Code
                          Representation      Properties


                           fA: open(CONST)
  fB: open(CONST)                                open() < read()
                                                 open() < close()
                               read(fA)
                                                 read() < read()
 fA: open(CONST)
                               close(fA)         read() < close()

Loop:
      read(fA)
 write(fB, CONST)

                           fB: open(CONST)       open() < write()
        close(fA)                                open() < close()
                           write(fB, CONST)      write() < write()
        close(fB)              close(fB)
                                                 write() < close()
thousands of projects
8,000


6,000


4,000


2,000


   0
        C projects
8,000

          6,097
6,000


4,000


2,000


   0
        C projects
200,000,000                   8,000

                                        6,097
150,000,000                   6,000


100,000,000                   4,000


 50,000,000                   2,000


         0                       0
              Lines of code           C projects
201,321,237
200,000,000                   8,000

                                        6,097
150,000,000                   6,000


100,000,000                   4,000


 50,000,000                   2,000


         0                       0
              Lines of code           C projects
6,097 C projects
201,321,237 lines of code
5,985,193 functions
15,803,766 properties (“f < g”)
6 GB database
18 hours analysis time
       single core
11 million lines of code per hour
11 seconds per project
static int dcc_listen_init (…) {
    dcc->sok = socket(…);
    if (…) {
        while (…) {
            … = bind (dcc->sok, …);
        }
        /* with a small port range, reUseAddr is needed */
        setsockopt (dcc->sok, …, SO_REUSEADDR, …);
    }
    listen (dcc->sok, …);
}
static int dcc_listen_init (…) {
    dcc->sok = socket(…);
    if (…) {
        while (…) {
            … = bind (dcc->sok, …);
        }
        /* with a small port range, reUseAddr is needed */
        setsockopt (dcc->sok, …, SO_REUSEADDR, …);
    }
    listen (dcc->sok, …);           should be called before bind()
}
static int find_file (…)
{
    DIR *dirp;
    struct dirent *dirinfo;
    …
    dirp = opendir(".");
    if (dirp == NULL)
    {
        …
    }
    while ((dirinfo = readdir(dirp)) != NULL)
    {
        …
    }
    rewinddir(dirp);
    return 1;
}
static int find_file (…)
{
    DIR *dirp;
    struct dirent *dirinfo;
    …
    dirp = opendir(".");
    if (dirp == NULL)
    {
        …
    }
    while ((dirinfo = readdir(dirp)) != NULL)
    {
        …
    }
    rewinddir(dirp);
    return 1;    should call closedir() instead
}
Platform
Check my Code

•   Check your code against
    the wisdom of Linux

•   Builds on millions of
    mined speci cations

•   Detects problems no
    other tool can detect


             www.checkmycode.org
Check my Code

•   Check your code against
    the wisdom of Linux
                              Dat  abase
•   Builds on millions of
                                    ilable
    mined speci cations        ava
                              fo r dow nload
•   Detects problems no
    other tool can detect


             www.checkmycode.org
speci cation crisis
speci cation crisis
microsoft word      mobile phones

travel booking     operating systems

airplane control   banking systems
microsoft word                mobile phones

travel booking               operating systems

airplane control             banking systems




                   easy to mine
Challenges
Challenges

•   Mining complete speci cations
Challenges

•   Mining complete speci cations
•   Finding relevant abstractions
Challenges

•   Mining complete speci cations
•   Finding relevant abstractions
•   Producing readable speci cations
Challenges

•   Mining complete speci cations
•   Finding relevant abstractions
•   Producing readable speci cations
•   Integrating speci cation mining
    and programming
Andrzej Wasylkowski   Christian Lindig   Natalie Gruska
Summary
Summary
Summary
Summary
Summary
Summary

Learning from 6,000 projects mining specifications in the large

  • 1.
    Learning from 6,000 Projects MiningModels in the Large Andreas Zeller Saarland University
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    Saarbrücken ® Visual Computing Institute
  • 12.
  • 13.
  • 14.
    Some numbers • ~70 PhD advisors in computer science
  • 15.
    Some numbers • ~70 PhD advisors in computer science • ≥ 300 PhD students in computer science
  • 16.
    Some numbers • ~70 PhD advisors in computer science • ≥ 300 PhD students in computer science • ~60 new PhD graduates per year
  • 17.
    Some numbers • ~70 PhD advisors in computer science • ≥ 300 PhD students in computer science • ~60 new PhD graduates per year • ~60 new MSc graduates per year
  • 18.
    Some numbers • ~70 PhD advisors in computer science • ≥ 300 PhD students in computer science • ~60 new PhD graduates per year • ~60 new MSc graduates per year • 800–1400 € per month as a PhD stipend (+ laptop & office • starting right after BSc • all courses in English)
  • 19.
    Two Graduates Michael Backes Andrej Rybalchenko TR35 in 2009 TR35 in 2010
  • 20.
    Michael Backes Andrej Rybalchenko
  • 21.
    secure protocols Andrej Rybalchenko
  • 22.
    secure protocols loop termination
  • 23.
    secure protocols loop termination hard to verify
  • 24.
    secure protocols loop termination hard to verify
  • 25.
    information ow secure protocols loop termination hard to verify
  • 26.
    information ow liveness secure protocols loop termination hard to verify
  • 27.
    buffer over ow informationow liveness secure protocols loop termination hard to verify
  • 28.
    buffer over ow resource leaks information ow liveness secure protocols loop termination hard to verify
  • 29.
    buffer over ow resource leaks information ow liveness secure protocols loop termination easy to specify hard to verify
  • 30.
  • 31.
  • 32.
    ∀i ∈ {0,. . . , |x |} : x [i] < x [i + 1] |x| = |x | ∀i ∈ {0, . . . , |x|} : ιi ∈ {0, . . . , |x |} : x[i] = x [i ] ∀i ∈ {0, . . . , |x |} : ιi ∈ {0, . . . , |x|} : x [i ] = x[i] hard to specify
  • 33.
    ∀i ∈ {0,. . . , |x |} : x [i] < x [i + 1] |x| = |x | ∀i ∈ {0, . . . , |x|} : ιi ∈ {0, . . . , |x |} : x[i] = x [i ] ∀i ∈ {0, . . . , |x |} : ιi ∈ {0, . . . , |x|} : x [i ] = x[i] easy to verify hard to specify
  • 34.
    is-sorted(x ) ∧is-permutation(x, x ) still hard to specify
  • 36.
  • 37.
  • 38.
  • 39.
    microsoft word mobile phones travel booking airplane control
  • 40.
    microsoft word mobile phones travel booking operating systems airplane control
  • 41.
    microsoft word mobile phones travel booking operating systems airplane control banking systems
  • 42.
    microsoft word mobile phones travel booking operating systems airplane control banking systems hard to specify
  • 43.
    microsoft word mobile phones travel booking operating systems airplane control banking systems easy to verify hard to specify
  • 44.
  • 45.
    hard to specify newlanguage • duplicate effort • can’t abstract from details
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
    Speci cations ∀i ∈{0, . . . , |x |} : x [i] < x [i + 1] |x| = |x | ∀i ∈ {0, . . . , |x|} : ιi ∈ {0, . . . , |x |} : x[i] = x [i ] ∀i ∈ {0, . . . , |x |} : ιi ∈ {0, . . . , |x|} : x [i ] = x[i] pre- and postconditions
  • 51.
    Speci cations auth()! <init>() openPort() socket: null socket: ¬null state: NOT_CON state: PLAIN quit() auth() socket: ¬null state: AUTH nite state models
  • 52.
  • 53.
  • 54.
    OP-Miner Usage Models Program iter.hasNext () iter.next ()
  • 55.
    OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next
  • 56.
    OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next Patterns hasNext ≺ next hasNext ≺ hasNext
  • 57.
    OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next Anomalies Patterns hasNext ≺ next ✓ hasNext ≺ hasNext hasNext ≺ next hasNext ≺ next hasNext ≺ hasNext ✗ hasNext ≺ hasNext
  • 58.
    OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next Anomalies Patterns hasNext ≺ next ✓ hasNext ≺ hasNext hasNext ≺ next hasNext ≺ next hasNext ≺ hasNext ✗ hasNext ≺ hasNext
  • 60.
    public Stack createStack() { Random r = new Random (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; while (i < n) { s.push (rand (r)); i++; } s.push (-1); return s; }
  • 61.
    public Stack createStack() { Random r = new Random (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; while (i < n) { s.push (rand (r)); i++; } s.push (-1); return s; }
  • 62.
    Random r =new Random (); public Stack createStack () { Random r = new Random (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; while (i < n) { s.push (rand (r)); i++; } s.push (-1); return s; }
  • 63.
    Random r =new Random (); public Stack createStack () { Random r = new Random (); int n = r.nextInt (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; Stack s = new Stack (); while (i < n) { s.push (rand (r)); i++; int i = 0; } s.push (-1); return s; }
  • 64.
    Random r =new Random (); public Stack createStack () { Random r = new Random (); int n = r.nextInt (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; Stack s = new Stack (); while (i < n) { s.push (rand (r)); i++; int i = 0; } s.push (-1); i < n return s; i++; } s.push (rand (r));
  • 65.
    Random r =new Random (); public Stack createStack () { Random r = new Random (); int n = r.nextInt (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; Stack s = new Stack (); while (i < n) { s.push (rand (r)); i++; int i = 0; } s.push (-1); i < n i < n return s; i++; } s.push (-1); s.push (rand (r));
  • 66.
    Random r =new Random (); public Stack createStack () { Random r = new Random (); int n = r.nextInt (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; Stack s = new Stack (); while (i < n) { s.push (rand (r)); i++; int i = 0; } s.push (-1); i < n i < n return s; i++; } s.push (-1); s.push (rand (r));
  • 67.
    Random r =new Random (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; i < n i < n i++; s.push (-1); s.push (rand (r));
  • 68.
    Stack s =new Stack (); s.push (-1); s.push (rand (r));
  • 69.
    s.<init>() s.push(_) s.push (_)
  • 70.
    Random r =new Random (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; i < n i < n i++; s.push (-1); s.push (rand (r));
  • 71.
    Random r =new Random (); int n = r.nextInt (); s.push (rand (r));
  • 72.
  • 73.
    OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next Anomalies Patterns hasNext ≺ next ✓ hasNext ≺ hasNext hasNext ≺ next hasNext ≺ next hasNext ≺ hasNext ✗ hasNext ≺ hasNext
  • 74.
    OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next Anomalies Patterns hasNext ≺ next ✓ hasNext ≺ hasNext hasNext ≺ next hasNext ≺ next hasNext ≺ hasNext ✗ hasNext ≺ hasNext
  • 75.
    Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close Methods
  • 76.
    Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods
  • 77.
    Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open()
  • 78.
    Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open() hello()
  • 79.
    Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open() hello() parse()
  • 80.
    Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open() hello() parse()
  • 81.
    Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open() hello() parse()
  • 82.
    Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close Pattern get() Methods open() hello() parse()
  • 83.
    Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close Pattern get() Methods open() hello() parse() Support
  • 84.
    Discovering Anomalies Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open() hello() parse()
  • 85.
    Discovering Anomalies Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close Anomaly get() ✘ Methods open() hello() parse()
  • 86.
  • 87.
    for (Iterator iter= itdFields.iterator(); iter.hasNext();) { ... for (Iterator iter2 = worthRetrying.iterator(); iter.hasNext();) { ... } }
  • 88.
    for (Iterator iter= itdFields.iterator(); iter.hasNext();) { ... for (Iterator iter2 = worthRetrying.iterator(); iter.hasNext();) { ... should be iter2 } }
  • 89.
    public void visitNEWARRAY(NEWARRAY o) { byte t = o.getTypecode (); if (!((t == Constants.T_BOOLEAN) || (t == Constants.T_CHAR) || ... (t == Constants.T_LONG))) { constraintViolated (o, "(...) '+t+' (...)"); } }
  • 90.
    public void visitNEWARRAY(NEWARRAY o) { byte t = o.getTypecode (); if (!((t == Constants.T_BOOLEAN) || (t == Constants.T_CHAR) || ... (t == Constants.T_LONG))) { constraintViolated (o, "(...) '+t+' (...)"); } } should be double quotes
  • 91.
    Name internalNewName (String[]identifiers) ... for (int i = 1; i < count; i++) { SimpleName name = new SimpleName(this); name.internalSetIdentifier(identifiers[i]); ... } ... }
  • 92.
    Name internalNewName (String[]identifiers) ... for (int i = 1; i < count; i++) { SimpleName name = new SimpleName(this); name.internalSetIdentifier(identifiers[i]); ... } should stay as is ... }
  • 93.
    public String getRetentionPolicy() { ... for (Iterator it = ...; it.hasNext();) { ... = it.next(); ... return retentionPolicy; } ... }
  • 94.
    public String getRetentionPolicy() { ... for (Iterator it = ...; it.hasNext();) { ... = it.next(); ... return retentionPolicy; } ... should be xed }
  • 96.
    44% of violations aredefects or code smells
  • 97.
  • 98.
    mine speci cations acrossthousands of projects
  • 99.
    Wisdom of thecrowds Francis Galton Nein, links auch nicht
  • 100.
    Wisdom of thecrowds Francis Galton Nein, links auch nicht
  • 102.
  • 103.
    Target Languages Java C++ C PHP Javascript
  • 104.
    Target Languages Java C++ C PHP Javascript Similar syntax {...} ; foo()
  • 105.
    Target Languages Java C++ C PHP Javascript Similar syntax {...} ; foo() Similar keywords while if switch return
  • 106.
    Lightweight Parser Abstract Temporal Source Code Representation Properties
  • 107.
    Lightweight Parser Abstract Temporal Source Code Representation Properties } language-independent lightweight parsing
  • 108.
    Abstract Temporal Source Code Representation Properties
  • 109.
    Abstract Temporal Source Code Representation Properties int j; int fA; int fB = open(“newFile”); fA = open(“myFile”); j = 7; while (j > 3) { read(fA); write(fB, “Hello”); j--; } close(fA); close(fB);
  • 110.
    Abstract Temporal Source Code Representation Properties int j; fB: open(CONST) int fA; int fB = open(“newFile”); fA: open(CONST) fA = open(“myFile”); j = 7; while (j > 3) { Loop: read(fA); read(fA) write(fB, “Hello”); write(fB, CONST) j--; } close(fA) close(fA); close(fB); close(fB)
  • 111.
    Abstract Temporal Source Code Representation Properties fB: open(CONST) fA: open(CONST) Loop: read(fA) write(fB, CONST) close(fA) close(fB)
  • 112.
    Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) read(fA) fA: open(CONST) close(fA) Loop: read(fA) write(fB, CONST) close(fA) close(fB)
  • 113.
    Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) read(fA) fA: open(CONST) close(fA) Loop: read(fA) write(fB, CONST) fB: open(CONST) close(fA) write(fB, CONST) close(fB) close(fB)
  • 114.
    Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) open() < read() read(fA) fA: open(CONST) close(fA) Loop: read(fA) write(fB, CONST) fB: open(CONST) close(fA) write(fB, CONST) close(fB) close(fB)
  • 115.
    Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) open() < read() open() < close() read(fA) fA: open(CONST) close(fA) Loop: read(fA) write(fB, CONST) fB: open(CONST) close(fA) write(fB, CONST) close(fB) close(fB)
  • 116.
    Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) open() < read() open() < close() read(fA) read() < read() fA: open(CONST) close(fA) Loop: read(fA) write(fB, CONST) fB: open(CONST) close(fA) write(fB, CONST) close(fB) close(fB)
  • 117.
    Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) open() < read() open() < close() read(fA) read() < read() fA: open(CONST) close(fA) read() < close() Loop: read(fA) write(fB, CONST) fB: open(CONST) close(fA) write(fB, CONST) close(fB) close(fB)
  • 118.
    Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) open() < read() open() < close() read(fA) read() < read() fA: open(CONST) close(fA) read() < close() Loop: read(fA) write(fB, CONST) fB: open(CONST) open() < write() close(fA) open() < close() write(fB, CONST) write() < write() close(fB) close(fB) write() < close()
  • 120.
  • 122.
  • 123.
    8,000 6,097 6,000 4,000 2,000 0 C projects
  • 124.
    200,000,000 8,000 6,097 150,000,000 6,000 100,000,000 4,000 50,000,000 2,000 0 0 Lines of code C projects
  • 125.
    201,321,237 200,000,000 8,000 6,097 150,000,000 6,000 100,000,000 4,000 50,000,000 2,000 0 0 Lines of code C projects
  • 126.
  • 127.
  • 128.
  • 129.
  • 130.
  • 131.
    18 hours analysistime single core
  • 132.
    11 million linesof code per hour
  • 133.
  • 134.
    static int dcc_listen_init(…) { dcc->sok = socket(…); if (…) { while (…) { … = bind (dcc->sok, …); } /* with a small port range, reUseAddr is needed */ setsockopt (dcc->sok, …, SO_REUSEADDR, …); } listen (dcc->sok, …); }
  • 135.
    static int dcc_listen_init(…) { dcc->sok = socket(…); if (…) { while (…) { … = bind (dcc->sok, …); } /* with a small port range, reUseAddr is needed */ setsockopt (dcc->sok, …, SO_REUSEADDR, …); } listen (dcc->sok, …); should be called before bind() }
  • 136.
    static int find_file(…) { DIR *dirp; struct dirent *dirinfo; … dirp = opendir("."); if (dirp == NULL) { … } while ((dirinfo = readdir(dirp)) != NULL) { … } rewinddir(dirp); return 1; }
  • 137.
    static int find_file(…) { DIR *dirp; struct dirent *dirinfo; … dirp = opendir("."); if (dirp == NULL) { … } while ((dirinfo = readdir(dirp)) != NULL) { … } rewinddir(dirp); return 1; should call closedir() instead }
  • 139.
  • 142.
    Check my Code • Check your code against the wisdom of Linux • Builds on millions of mined speci cations • Detects problems no other tool can detect www.checkmycode.org
  • 143.
    Check my Code • Check your code against the wisdom of Linux Dat abase • Builds on millions of ilable mined speci cations ava fo r dow nload • Detects problems no other tool can detect www.checkmycode.org
  • 144.
  • 145.
  • 146.
    microsoft word mobile phones travel booking operating systems airplane control banking systems
  • 147.
    microsoft word mobile phones travel booking operating systems airplane control banking systems easy to mine
  • 148.
  • 149.
    Challenges • Mining complete speci cations
  • 150.
    Challenges • Mining complete speci cations • Finding relevant abstractions
  • 151.
    Challenges • Mining complete speci cations • Finding relevant abstractions • Producing readable speci cations
  • 152.
    Challenges • Mining complete speci cations • Finding relevant abstractions • Producing readable speci cations • Integrating speci cation mining and programming
  • 153.
    Andrzej Wasylkowski Christian Lindig Natalie Gruska
  • 154.
  • 155.
  • 156.
  • 157.
  • 158.
  • 159.

Editor's Notes

  • #23 You talk to these people, and you immediately realize they&amp;#x2019;re smart. They&amp;#x2019;re really smart &amp;#x2013; Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he&amp;#x2019;s the best paid professor of Germany.
  • #24 You talk to these people, and you immediately realize they&amp;#x2019;re smart. They&amp;#x2019;re really smart &amp;#x2013; Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he&amp;#x2019;s the best paid professor of Germany.
  • #25 You talk to these people, and you immediately realize they&amp;#x2019;re smart. They&amp;#x2019;re really smart &amp;#x2013; Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he&amp;#x2019;s the best paid professor of Germany.
  • #26 You talk to these people, and you immediately realize they&amp;#x2019;re smart. They&amp;#x2019;re really smart &amp;#x2013; Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he&amp;#x2019;s the best paid professor of Germany.
  • #27 You talk to these people, and you immediately realize they&amp;#x2019;re smart. They&amp;#x2019;re really smart &amp;#x2013; Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he&amp;#x2019;s the best paid professor of Germany.
  • #28 You talk to these people, and you immediately realize they&amp;#x2019;re smart. They&amp;#x2019;re really smart &amp;#x2013; Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he&amp;#x2019;s the best paid professor of Germany.
  • #29 They chose to do these other things, not because they are easy, but because they are hard. Hard to verify, this is. Many things are of that kind. However&amp;#x2026; notice that all these problems can be stated in very simple terms.
  • #30 They chose to do these other things, not because they are easy, but because they are hard. Hard to verify, this is. Many things are of that kind. However&amp;#x2026; notice that all these problems can be stated in very simple terms.
  • #31 They chose to do these other things, not because they are easy, but because they are hard. Hard to verify, this is. Many things are of that kind. However&amp;#x2026; notice that all these problems can be stated in very simple terms.
  • #32 They chose to do these other things, not because they are easy, but because they are hard. Hard to verify, this is. Many things are of that kind. However&amp;#x2026; notice that all these problems can be stated in very simple terms.
  • #33 They chose to do these other things, not because they are easy, but because they are hard. Hard to verify, this is. Many things are of that kind. However&amp;#x2026; notice that all these problems can be stated in very simple terms.
  • #34 What do I mean by &amp;#x201C;easy to specify&amp;#x201D;? Here&amp;#x2019;s something that&amp;#x2019;s hard to verify &amp;#x2013; sorting.
  • #35 Tell story of first NORA talk forall i in {0, dots, |x&apos;|} :&amp;: x&apos;[i] &lt; x&apos;[i + 1] \ |x| = |x&apos;| \ forall i in {0, dots, |x|}:&amp;: iota i&apos; in {0, dots, |x&apos;|}: x[i] = x&apos;[i&apos;] \ forall i&apos; in {0, dots, |x&apos;|}:&amp;: iota i in {0, dots, |x|}: x&apos;[i&apos;] = x[i]
  • #36 Tell story of first NORA talk forall i in {0, dots, |x&apos;|} :&amp;: x&apos;[i] &lt; x&apos;[i + 1] \ |x| = |x&apos;| \ forall i in {0, dots, |x|}:&amp;: iota i&apos; in {0, dots, |x&apos;|}: x[i] = x&apos;[i&apos;] \ forall i&apos; in {0, dots, |x&apos;|}:&amp;: iota i in {0, dots, |x|}: x&apos;[i&apos;] = x[i]
  • #37 We can introduce a vocabulary, and do things incrementally, but the burden remains. ext{is-sorted}(x&apos;) land ext{is-permutation}(x, x&apos;)
  • #38 It&amp;#x2019;s nice to know that MS word won&amp;#x2019;t dereference null pointers, but will it print my text? Full of functional properties
  • #39 It&amp;#x2019;s nice to know that MS word won&amp;#x2019;t dereference null pointers, but will it print my text? Full of functional properties
  • #40 It&amp;#x2019;s nice to know that MS word won&amp;#x2019;t dereference null pointers, but will it print my text? Full of functional properties
  • #41 It&amp;#x2019;s nice to know that MS word won&amp;#x2019;t dereference null pointers, but will it print my text? Full of functional properties
  • #42 It&amp;#x2019;s nice to know that MS word won&amp;#x2019;t dereference null pointers, but will it print my text? Full of functional properties
  • #43 It&amp;#x2019;s nice to know that MS word won&amp;#x2019;t dereference null pointers, but will it print my text? Full of functional properties
  • #44 It&amp;#x2019;s nice to know that MS word won&amp;#x2019;t dereference null pointers, but will it print my text? Full of functional properties
  • #45 It&amp;#x2019;s nice to know that MS word won&amp;#x2019;t dereference null pointers, but will it print my text? Full of functional properties
  • #46 Why is it that things are hard to specify? &amp;#x21D2; New language, &amp;#x21D2; Effort duplicated, &amp;#x21D2; Can&amp;#x2019;t abstract from details
  • #49 and leverage the knowledge of 50 years of programming! This is what my talk today is about. In fact, it&amp;#x2019;s about mining specifications from 6,000 projects &amp;#x2013; the largest such attempt ever.
  • #50 and leverage the knowledge of 50 years of programming! This is what my talk today is about. In fact, it&amp;#x2019;s about mining specifications from 6,000 projects &amp;#x2013; the largest such attempt ever.
  • #51 Dynamic invariants &amp;#x2013; mined from executions Work by Michael Ernst &amp;#x2013; my big inspiration Describe what should hold &amp;#x2013; but not how to get there
  • #52 API usage &amp;#x2013; as mined from executions Describe what holds &amp;#x2013; and how to achieve it!
  • #106 This would be a pattern, if it were not for the missing element
  • #107 This would be a pattern, if it were not for the missing element
  • #108 This would be a pattern, if it were not for the missing element
  • #109 This would be a pattern, if it were not for the missing element
  • #110 We can detect such gaps by looking at overlapping patterns (concepts)
  • #111 We can detect such gaps by looking at overlapping patterns (concepts)
  • #112 We can detect such gaps by looking at overlapping patterns (concepts)
  • #113 We can detect such gaps by looking at overlapping patterns (concepts)
  • #114 We can detect such gaps by looking at overlapping patterns (concepts)
  • #115 Produced in 8 minutes on this machine
  • #118 On encountering a wrong typecode, &lt;visitNEWARRAY()&gt; should report the typecode to the user. However, it fails to do so, as it uses &lt;&apos;+t+&apos;&gt; instead of &lt;&quot;+t+&quot;&gt; when constructing the second parameter to the &lt;constraintViolated()&gt; method, causing the string &lt;&apos;+t+&apos;&gt; to be interpreted verbatim---the message contains &lt;&apos;+t+&apos;&gt; rather than the typecode in &lt;t&gt;. OPMiner{} reports this as an OP violation: the second parameter of &lt;constraintViolated()&gt; should be the result of a &lt;StringBuffer.toString()&gt; method call---i.e. a constructed string rather than a constant string. The rationale for using a constructed string is to include some information about the violation.
  • #119 On encountering a wrong typecode, &lt;visitNEWARRAY()&gt; should report the typecode to the user. However, it fails to do so, as it uses &lt;&apos;+t+&apos;&gt; instead of &lt;&quot;+t+&quot;&gt; when constructing the second parameter to the &lt;constraintViolated()&gt; method, causing the string &lt;&apos;+t+&apos;&gt; to be interpreted verbatim---the message contains &lt;&apos;+t+&apos;&gt; rather than the typecode in &lt;t&gt;. OPMiner{} reports this as an OP violation: the second parameter of &lt;constraintViolated()&gt; should be the result of a &lt;StringBuffer.toString()&gt; method call---i.e. a constructed string rather than a constant string. The rationale for using a constructed string is to include some information about the violation.
  • #120 In 48 cases: argument comes from String() constructor; only in 3 cases: from array
  • #121 In 48 cases: argument comes from String() constructor; only in 3 cases: from array
  • #122 Code smell &amp;#x2192; does not result in errors, but may cause maintainability problems Defects &amp;#x2192; reported &amp; verified
  • #123 Code smell &amp;#x2192; does not result in errors, but may cause maintainability problems Defects &amp;#x2192; reported &amp; verified
  • #124 44% holds for AspectJ; same for other projects Lots of subtle defects in production code Unclear whether these would be found by other means
  • #125 and leverage the knowledge of 50 years of programming! This is what my talk today is about
  • #126 and leverage the knowledge of 50 years of programming! This is what my talk today is about
  • #127 and leverage the knowledge of 50 years of programming! This is what my talk today is about
  • #128 Die einleitende Geschichte erz&amp;#xE4;hlt von Francis Galtons &amp;#xDC;berraschung, dass Besucher einer Vieh-Ausstellung im Rahmen eines Gewinnspiels das Schlachtgewicht eines Rindes genau sch&amp;#xE4;tzten, wenn man als Sch&amp;#xE4;tzwert der Gruppe den Mittelwert aller Sch&amp;#xE4;tzungen annahm. (Die Sch&amp;#xE4;tzung der Gruppe war sogar besser als die jedes einzelnen Teilnehmers, darunter manche Metzger.)
  • #129 First thing we needed was a lightweight parser
  • #132 Wir m&amp;#xFC;ssen daher in der Lage sein, gro&amp;#xDF;e Mengen Code zu analysieren &amp;#x2013; am besten Quellcode.
  • #133 Wir m&amp;#xFC;ssen daher in der Lage sein, gro&amp;#xDF;e Mengen Code zu analysieren &amp;#x2013; am besten Quellcode.
  • #144 Next thing we needed was thousands of projects
  • #145 We have 6097 projects in our reference database. Their size ranges from 7 (for openssl-blacklist_0.4.2 and openvpn-blacklist_0.3) to 5,491,951 (for linux-2.6.29) SLOC (generated using David A. Wheeler&apos;s &apos;SLOCCount&apos;; includes only .c files). Some other statistics: &amp;#xA0;[first quartile]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;1093 &amp;#xA0;[third quartile]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;16160 &amp;#xA0;[median]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;4162 &amp;#xA0;[mean]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;33020
  • #146 We have 6097 projects in our reference database. Their size ranges from 7 (for openssl-blacklist_0.4.2 and openvpn-blacklist_0.3) to 5,491,951 (for linux-2.6.29) SLOC (generated using David A. Wheeler&apos;s &apos;SLOCCount&apos;; includes only .c files). Some other statistics: &amp;#xA0;[first quartile]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;1093 &amp;#xA0;[third quartile]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;16160 &amp;#xA0;[median]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;4162 &amp;#xA0;[mean]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;33020
  • #147 We have 6097 projects in our reference database. Their size ranges from 7 (for openssl-blacklist_0.4.2 and openvpn-blacklist_0.3) to 5,491,951 (for linux-2.6.29) SLOC (generated using David A. Wheeler&apos;s &apos;SLOCCount&apos;; includes only .c files). Some other statistics: &amp;#xA0;[first quartile]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;1093 &amp;#xA0;[third quartile]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;16160 &amp;#xA0;[median]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;4162 &amp;#xA0;[mean]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;33020
  • #148 We have 6097 projects in our reference database. Their size ranges from 7 (for openssl-blacklist_0.4.2 and openvpn-blacklist_0.3) to 5,491,951 (for linux-2.6.29) SLOC (generated using David A. Wheeler&apos;s &apos;SLOCCount&apos;; includes only .c files). Some other statistics: &amp;#xA0;[first quartile]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;1093 &amp;#xA0;[third quartile]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;16160 &amp;#xA0;[median]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;4162 &amp;#xA0;[mean]: &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;33020
  • #158 Defect in Conspire 0.20
  • #159 Defect in Conspire 0.20
  • #160 Defect in cksfv-1.3.13
  • #161 Defect in cksfv-1.3.13
  • #165 As a special treat to SCAM attendees, we&amp;#x2019;re making all of our database available &amp;#x2013; today!
  • #166 coming back to the beginning of my talk &amp;#x2013; are we facing a specification crisis? Yes.
  • #167 coming back to the beginning of my talk &amp;#x2013; are we facing a specification crisis? Yes.
  • #168 But we can alleviate it
  • #169 by reusing and abstracting from all the code that&amp;#x2019;s around.
  • #170 But still, we just scratch the surface of the knowledge that&amp;#x2019;s in there. Plenty of work lies ahead of us.
  • #171 But still, we just scratch the surface of the knowledge that&amp;#x2019;s in there. Plenty of work lies ahead of us.
  • #172 But still, we just scratch the surface of the knowledge that&amp;#x2019;s in there. Plenty of work lies ahead of us.
  • #173 But still, we just scratch the surface of the knowledge that&amp;#x2019;s in there. Plenty of work lies ahead of us.
  • #174 But with these future challenges, let&amp;#x2019;s not forget past challenges. My students faced these challenges not because they were easy, but because they were hard. And I am very grateful for the wonderful results they achieved.