Motivation
□ >50% of maintenance time spent trying to
  understand the program




Marc Eaddy           ICPC 2008               2
Motivation
□ >50% of maintenance time spent trying to
  understand the program
       □ Where are the features,
         reqs, etc. in the code?




Marc Eaddy                  ICPC 2008        3
Motivation
□ >50% of maintenance time spent trying to
  understand the program
       □ Where are the features,
         reqs, etc. in the code?
       □ What is this code for?




Marc Eaddy                  ICPC 2008        4
Motivation
□ >50% of maintenance time spent trying to
  understand the program
       □ Where are the features,
         reqs, etc. in the code?
       □ What is this code for?
       □ Why is it hard to
         understand and change
         the program?


Marc Eaddy                  ICPC 2008        5
What is a “concern?”
             Anything that affects the implementation of a program

□ Feature, requirement, design pattern,
  coding idiom, etc.
□ Raison d'être for code
       □ Every line of code exists to satisfy some concern




Marc Eaddy                          ICPC 2008                        6
Concern location problem
             Concern–code relationship hard to obtain

                                         Program
              Concerns                   Elements




Marc Eaddy                   ICPC 2008                  7
Concern location problem
             Concern–code relationship hard to obtain

                                         Program
              Concerns                   Elements




□ Concern–code relationship undocumented


Marc Eaddy                   ICPC 2008                  8
Concern location problem
                Concern–code relationship hard to obtain

                                            Program
                 Concerns                   Elements




□ Concern–code relationship undocumented
□ Reverse engineer the relationship
       □ (but, which one?)
Marc Eaddy                      ICPC 2008                  9
Software pruning
□ Remove code that supports certain features,
  reqs, etc.
       □ Reduce program’s footprint
       □ Support different platforms
       □ Simplify program




Marc Eaddy                  ICPC 2008       10
Prune dependency rule [ACOM’07]
□ Code is prune dependent on concern if
       □ Pruning the concern requires removing or
         altering the code
□ Must alter code that depends on removed
  code
       □ Prevent compile errors
       □ Eliminate “dead code”
□ Easy to determine/approximate
Marc Eaddy                  ICPC 2008               11
Automated concern location
             Concern–code relationship predicted by an “expert”

□ Experts mine clues in code, docs, etc.
□ Existing techniques use 1 or 2 experts only
□ Our solution: Cerberus
       1. Information retrieval
       2. Execution tracing
       3. Prune dependency analysis



Marc Eaddy                        ICPC 2008                       12
IR-based concern location
□ i.e., Google for code
□ Program entities are documents
□ Requirements are queries
             Requirement                Source
             “Array.join”                Code

                                        Id_joi
                 join                      n
                                        js_join(
                                             )
Marc Eaddy                  ICPC 2008              13
Vector space model [Salton]
□ Parse code and reqs doc to extract term vectors
       □ NativeArray.js_join() method “native,” “array,” “join”
       □ “Array.join” requirement “array,” “join”
□ Our contributions
       □ Expand abbreviations
             □ numconns   number, connections, numberconnections
       □ Index fields
□ Weigh terms (tf · idf)
       □ Term frequency (tf)
       □ Inverse document frequency (idf)
□ Similarity = cosine distance between document and
  query vectors
Marc Eaddy                         ICPC 2008                       14
Tracing-based concern location
□ Observe elements activated when concern is
  exercised
       □ Unit tests for each concern
       □ e.g., find elements uniquely activated by a concern




Marc Eaddy                      ICPC 2008                      15
Tracing-based concern location
□ Observe elements activated when concern is
  exercised
       □ Unit tests for each concern
       □ e.g., find elements uniquely activated by a concern
                      Unit Test               Call
                  for “Array.join”           Graph
                 var a = new Array(1,
                 2);
                 if (a.join(',') ==
                 "1,2")
                 {
                     print "Test
                 passed";
                 }
Marc Eaddy       else {                                        16
                     print "Test      js_construct js_joi
                 failed";                            n
Tracing-based concern location
□ Observe elements activated when concern is
  exercised
       □ Unit tests for each concern
       □ e.g., find elements uniquely activated by a concern
                      Unit Test               Call
                  for “Array.join”           Graph
                 var a = new Array(1,
                 2);
                 if (a.join(',') ==
                 "1,2")
                 {
                     print "Test
                 passed";
                 }
Marc Eaddy       else {                                        17
                     print "Test      js_construct js_joi
                 failed";                            n
Prune dependency analysis
□ Infer relevant elements based on structural
  relationship to relevant element e (seed)
       □ Assumes we already have some seeds
□ Prune dependency analysis
       □ Determines prune dependency rule using
         program analysis
       □ Find references to e
       □ Find superclasses and subclasses of e

Marc Eaddy                 ICPC 2008              18
PDA example

               Source Code               Program Dependency Graph
             interface A {                                        inherits
                 public void foo();                                         A
             }                               C                B
             public class B implements A {
                 public void foo() { ... }         refs
                 public void bar() { ... }
                                          contains            contains contains
             }
             public class C {                           contains
                 public static void main() {
                     B b = new B();
                     b.bar();                    calls
                 }                         main        bar       foo       foo




Marc Eaddy                                ICPC 2008                               19
PDA example

               Source Code               Program Dependency Graph
             interface A {                                        inherits
                 public void foo();                                         A
             }                               C                B
             public class B implements A {
                 public void foo() { ... }         refs
                 public void bar() { ... }
                                          contains            contains contains
             }
             public class C {                           contains
                 public static void main() {
                     B b = new B();
                     b.bar();                    calls
                 }                         main        bar       foo       foo




Marc Eaddy                                ICPC 2008                               20
PDA example

               Source Code               Program Dependency Graph
             interface A {                                        inherits
                 public void foo();                                         A
             }                               C                B
             public class B implements A {
                 public void foo() { ... }         refs
                 public void bar() { ... }
                                          contains            contains contains
             }
             public class C {                           contains
                 public static void main() {
                     B b = new B();
                     b.bar();                    calls
                 }                         main        bar       foo       foo




Marc Eaddy                                ICPC 2008                               21
PDA example

               Source Code               Program Dependency Graph
             interface A {
                 public void foo();
                                                                  inherits
                                             C                B             A
             }
             public class B implements A {
                 public void foo() { ... }         refs
                 public void bar() { ... }
                                          contains            contains contains
             }
             public class C {                           contains
                 public static void main() {
                     B b = new B();
                     b.bar();                    calls
                 }                         main        bar       foo       foo




Marc Eaddy                                ICPC 2008                               22
PDA example

               Source Code               Program Dependency Graph
             interface A {                                        inherits
                 public void foo();                                         A
             }                               C                B
             public class B implements A {
                 public void foo() { ... }         refs
                 public void bar() { ... }
                                          contains            contains contains
             }
             public class C {                           contains
                 public static void main() {
                     B b = new B();
                     b.bar();                    calls
                 }                         main        bar       foo       foo




Marc Eaddy                                ICPC 2008                               23
Cerberus




Marc Eaddy    ICPC 2008   24
Cerberus ≈ PROMESIR + SNIAFL




Marc Eaddy              ICPC 2008           25
Cerberus effectiveness
               Cerberus




Marc Eaddy                            26
Ignoring “No results found”

                      Cerberus




Marc Eaddy                                 27
Future work
□ Improve PDA
       □ Reimplemented using Soot and Polyglot
       □ Generalize using prune dependency predicates
       □ Improve precision using points-to analysis
       □ Improve accuracy using
             □ Dominator heuristic
             □ Variable liveness analysis
□ Improve accuracy of Cerberus
       □ Combine experts using matrix linear regression

Marc Eaddy                          ICPC 2008             28
Cerberus contributions
□ Effectively combined 3
  concern location techniques



□ PDA boosts performance of         Source Code
                                 interface A {
                                     public void foo();
                                                                    Program Dependency Graph

                                                                     C                   B           A




  other techniques
                                 }
                                 public class B implements A {
                                     public void foo() { ... }                refs
                                     public void bar() { ... }
                                                                   contains              contains contains
                                 }
                                 public class C {                                  contains
                                     public static void main() {
                                         B b = new B();
                                         b.bar();
                                                                           calls
                                     }                              main           bar        foo   foo




Marc Eaddy           ICPC 2008                                                                               29
Questions?


                   Marc Eaddy
               Columbia University
             eaddy@cs.columbia.edu




Marc Eaddy            ICPC 2008      30

ICPC08b.ppt

  • 2.
    Motivation □ >50% ofmaintenance time spent trying to understand the program Marc Eaddy ICPC 2008 2
  • 3.
    Motivation □ >50% ofmaintenance time spent trying to understand the program □ Where are the features, reqs, etc. in the code? Marc Eaddy ICPC 2008 3
  • 4.
    Motivation □ >50% ofmaintenance time spent trying to understand the program □ Where are the features, reqs, etc. in the code? □ What is this code for? Marc Eaddy ICPC 2008 4
  • 5.
    Motivation □ >50% ofmaintenance time spent trying to understand the program □ Where are the features, reqs, etc. in the code? □ What is this code for? □ Why is it hard to understand and change the program? Marc Eaddy ICPC 2008 5
  • 6.
    What is a“concern?” Anything that affects the implementation of a program □ Feature, requirement, design pattern, coding idiom, etc. □ Raison d'être for code □ Every line of code exists to satisfy some concern Marc Eaddy ICPC 2008 6
  • 7.
    Concern location problem Concern–code relationship hard to obtain Program Concerns Elements Marc Eaddy ICPC 2008 7
  • 8.
    Concern location problem Concern–code relationship hard to obtain Program Concerns Elements □ Concern–code relationship undocumented Marc Eaddy ICPC 2008 8
  • 9.
    Concern location problem Concern–code relationship hard to obtain Program Concerns Elements □ Concern–code relationship undocumented □ Reverse engineer the relationship □ (but, which one?) Marc Eaddy ICPC 2008 9
  • 10.
    Software pruning □ Removecode that supports certain features, reqs, etc. □ Reduce program’s footprint □ Support different platforms □ Simplify program Marc Eaddy ICPC 2008 10
  • 11.
    Prune dependency rule[ACOM’07] □ Code is prune dependent on concern if □ Pruning the concern requires removing or altering the code □ Must alter code that depends on removed code □ Prevent compile errors □ Eliminate “dead code” □ Easy to determine/approximate Marc Eaddy ICPC 2008 11
  • 12.
    Automated concern location Concern–code relationship predicted by an “expert” □ Experts mine clues in code, docs, etc. □ Existing techniques use 1 or 2 experts only □ Our solution: Cerberus 1. Information retrieval 2. Execution tracing 3. Prune dependency analysis Marc Eaddy ICPC 2008 12
  • 13.
    IR-based concern location □i.e., Google for code □ Program entities are documents □ Requirements are queries Requirement Source “Array.join” Code Id_joi join n js_join( ) Marc Eaddy ICPC 2008 13
  • 14.
    Vector space model[Salton] □ Parse code and reqs doc to extract term vectors □ NativeArray.js_join() method “native,” “array,” “join” □ “Array.join” requirement “array,” “join” □ Our contributions □ Expand abbreviations □ numconns number, connections, numberconnections □ Index fields □ Weigh terms (tf · idf) □ Term frequency (tf) □ Inverse document frequency (idf) □ Similarity = cosine distance between document and query vectors Marc Eaddy ICPC 2008 14
  • 15.
    Tracing-based concern location □Observe elements activated when concern is exercised □ Unit tests for each concern □ e.g., find elements uniquely activated by a concern Marc Eaddy ICPC 2008 15
  • 16.
    Tracing-based concern location □Observe elements activated when concern is exercised □ Unit tests for each concern □ e.g., find elements uniquely activated by a concern Unit Test Call for “Array.join” Graph var a = new Array(1, 2); if (a.join(',') == "1,2") { print "Test passed"; } Marc Eaddy else { 16 print "Test js_construct js_joi failed"; n
  • 17.
    Tracing-based concern location □Observe elements activated when concern is exercised □ Unit tests for each concern □ e.g., find elements uniquely activated by a concern Unit Test Call for “Array.join” Graph var a = new Array(1, 2); if (a.join(',') == "1,2") { print "Test passed"; } Marc Eaddy else { 17 print "Test js_construct js_joi failed"; n
  • 18.
    Prune dependency analysis □Infer relevant elements based on structural relationship to relevant element e (seed) □ Assumes we already have some seeds □ Prune dependency analysis □ Determines prune dependency rule using program analysis □ Find references to e □ Find superclasses and subclasses of e Marc Eaddy ICPC 2008 18
  • 19.
    PDA example Source Code Program Dependency Graph interface A { inherits public void foo(); A } C B public class B implements A { public void foo() { ... } refs public void bar() { ... } contains contains contains } public class C { contains public static void main() { B b = new B(); b.bar(); calls } main bar foo foo Marc Eaddy ICPC 2008 19
  • 20.
    PDA example Source Code Program Dependency Graph interface A { inherits public void foo(); A } C B public class B implements A { public void foo() { ... } refs public void bar() { ... } contains contains contains } public class C { contains public static void main() { B b = new B(); b.bar(); calls } main bar foo foo Marc Eaddy ICPC 2008 20
  • 21.
    PDA example Source Code Program Dependency Graph interface A { inherits public void foo(); A } C B public class B implements A { public void foo() { ... } refs public void bar() { ... } contains contains contains } public class C { contains public static void main() { B b = new B(); b.bar(); calls } main bar foo foo Marc Eaddy ICPC 2008 21
  • 22.
    PDA example Source Code Program Dependency Graph interface A { public void foo(); inherits C B A } public class B implements A { public void foo() { ... } refs public void bar() { ... } contains contains contains } public class C { contains public static void main() { B b = new B(); b.bar(); calls } main bar foo foo Marc Eaddy ICPC 2008 22
  • 23.
    PDA example Source Code Program Dependency Graph interface A { inherits public void foo(); A } C B public class B implements A { public void foo() { ... } refs public void bar() { ... } contains contains contains } public class C { contains public static void main() { B b = new B(); b.bar(); calls } main bar foo foo Marc Eaddy ICPC 2008 23
  • 24.
  • 25.
    Cerberus ≈ PROMESIR+ SNIAFL Marc Eaddy ICPC 2008 25
  • 26.
    Cerberus effectiveness Cerberus Marc Eaddy 26
  • 27.
    Ignoring “No resultsfound” Cerberus Marc Eaddy 27
  • 28.
    Future work □ ImprovePDA □ Reimplemented using Soot and Polyglot □ Generalize using prune dependency predicates □ Improve precision using points-to analysis □ Improve accuracy using □ Dominator heuristic □ Variable liveness analysis □ Improve accuracy of Cerberus □ Combine experts using matrix linear regression Marc Eaddy ICPC 2008 28
  • 29.
    Cerberus contributions □ Effectivelycombined 3 concern location techniques □ PDA boosts performance of Source Code interface A { public void foo(); Program Dependency Graph C B A other techniques } public class B implements A { public void foo() { ... } refs public void bar() { ... } contains contains contains } public class C { contains public static void main() { B b = new B(); b.bar(); calls } main bar foo foo Marc Eaddy ICPC 2008 29
  • 30.
    Questions? Marc Eaddy Columbia University eaddy@cs.columbia.edu Marc Eaddy ICPC 2008 30