Trends in Code
100

 75                      Yana Momchilova Mileva
 50                      Saarland University, Germany
...
Google Zeitgeist
Nov 4, 2008                                “david plouffe”

 1. mccain concession speech
 2. david plouff...
The Code Evolves
             Lines of Code
  30,000

  22,500

  15,000

   7,500

      0
       1999 2001 2003 2005 200...
The Questions
   How does this affect the future of the code?


   Evolution of code has an impact on the system.


Can th...
Evolution of Features
     • Variables
     • Import statements
     • Packages
     • Method calls
     • Word Stems

   ...
Evolution Patterns
3,000                  600                  300
2,250                  450                  240
       ...
Google Zeitgeist
“david plouffe”       “david plouffe”




United States           Germany




                  7
All is Project-specific
     import java.util.Stack             import java.util.Stack
60                                10...
Evolution of Features
     • Variables
     • Import statements
     • Packages
     • Method calls
     • Word Stems

   ...
Import Statements
                             Number of occurrences per year in Eclipse
                                 ...
Import Statements
                         Number of occurrences per year in Rhino
                                       ...
Deletion Patterns
         import java.util.Stack
    10
    8
    6
    4
    2
    0
     1999 2001 2003 2005
          ...
What happened to Stack?
  package org.mozilla.javascript;                           package org.mozilla.javascript;
  impo...
Evolution of Features
     • Variables
     • Import statements
     • Packages
     • Method calls
     • Word Stems

   ...
Method Calls
                        Number of occurrences per year in Eclipse
                                           ...
Method Calls
                       Number of occurrences per year in Rhino
                                              ...
Method Calls Deletion
      classFile = new ClassFileWriter (generatedClassName, superClassName, itsSourceFile);
      ......
Evolution of Features

     • Variables
     • Import statements
     • Packages
     • Method calls
     • Word Stems get...
Word Stems
              Number of occurrences per year in Rhino
                                                         ...
Learning from Evolution
                 tokens
 tokens db                              deletion
                evolution...
Learning from Evolution
                 tokens
 tokens db                             deletion
                evolution
...
Combining Patterns
      getNextSibling ( )           Combined Patterns                 getNext ( )
300                   ...
Combining Patterns
                  token          old token               new token         # substitutions
            ...
Combining Patterns
...                                                       ...
Node lhs = n.getFirstChild();            ...
Learning from Evolution
                  tokens
 tokens db                              combined
                        ...
Future tokens list
 Extend the
            Work
 • Variables
 • Import statements
 • Packages
 • Method calls
 • Word Stem...
FuturedoWork
                  Context matter

                   getShort( )           getIndex( )
                      ...
Future Work
                  Program Analysis Features



...                                     ...
scriptOrFn.getParam...
Planned Evaluation
•   Training and testing data sets:

    •   learn evolution patterns from the training set;

    •   p...
Related Work
          Thomas Zimmermann




                    Beat Fluri




                         Stephan Diehl



...
Summary
                                                                                         Evolution Patterns
      ...
Upcoming SlideShare
Loading in …5
×

FSE'08 Doctoral Symposium

359 views
341 views

Published on

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
359
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

FSE'08 Doctoral Symposium

  1. 1. Trends in Code 100 75 Yana Momchilova Mileva 50 Saarland University, Germany 25 Advisor: Prof. Andreas Zeller 0 2001 2003 2005 2007
  2. 2. Google Zeitgeist Nov 4, 2008 “david plouffe” 1. mccain concession speech 2. david plouffe 3. uncle tom 4. did prop 8 pass 5. mccain concedes 6. obama acceptance speech 7. david pluff 8. california election results 2008 9. obama elected president 10. obama campaign manager 2
  3. 3. The Code Evolves Lines of Code 30,000 22,500 15,000 7,500 0 1999 2001 2003 2005 2007 Rhino 3
  4. 4. The Questions How does this affect the future of the code? Evolution of code has an impact on the system. Can this evolution information prevent code defects? 4
  5. 5. Evolution of Features • Variables • Import statements • Packages • Method calls • Word Stems 5
  6. 6. Evolution Patterns 3,000 600 300 2,250 450 240 180 1,500 300 120 750 150 60 0 0 0 2001 2003 2005 1999 2002 2005 1999 2002 2005 20 30.0 500 15 22.5 375 10 15.0 250 5 7.5 125 0 0 0 2001 2003 2005 1999 2002 2005 1999 2002 2005 6
  7. 7. Google Zeitgeist “david plouffe” “david plouffe” United States Germany 7
  8. 8. All is Project-specific import java.util.Stack import java.util.Stack 60 10 45 8 Popular! 6 Old-fashioned! 30 4 15 2 0 0 2001 2003 2005 1999 2001 2003 2005 Eclipse Rhino 8
  9. 9. Evolution of Features • Variables • Import statements • Packages • Method calls • Word Stems 9
  10. 10. Import Statements Number of occurrences per year in Eclipse Evolution Imported classes patterns 2001 2002 2003 2004 2005 2006 java.util.List 455 942 1136 1631 2096 2480 swt.layout 335 229 268 372 314 328 internal.base.model.plugin 119 0 0 0 0 0 java.util.Stack 19 22 31 26 38 57 10
  11. 11. Import Statements Number of occurrences per year in Rhino Evolution Imported classes patterns 1999 2000 2001 2002 2003 2004 2005 2006 2007 java.io.Serializable 0 0 4 9 10 10 10 13 13 java.io.IOException 11 10 7 11 9 9 6 8 8 java.util.Hashtable 32 32 16 12 14 14 13 13 13 java.util.Stack 8 8 3 0 0 0 0 0 0 11
  12. 12. Deletion Patterns import java.util.Stack 10 8 6 4 2 0 1999 2001 2003 2005 Rhino 12
  13. 13. What happened to Stack? package org.mozilla.javascript; package org.mozilla.javascript; import java.util.Stack; import java.util.Vector; ... ... loops = new Stack (); loops = new ObjArray (); ... ... for (int i = loops.size()-1; i >= 0; i--) { for (int i = loops.size()-1; i >= 0; i--) { Node n = (Node) loops.elementAt(i); Node n = (Node) loops.get(i); if (n.getType() == TokenStream.LABEL) { if (n.getType() == TokenStream.LABEL) { ... ... } } } } Commit message: “I replaced Stack by ObjArray... It avoids unnecessary synchronization and save memory. To simplify the replacement I added to ObjArray and ObjToIntMap few utility methods.” 13
  14. 14. Evolution of Features • Variables • Import statements • Packages • Method calls • Word Stems 14
  15. 15. Method Calls Number of occurrences per year in Eclipse Evolution Method name patterns 2001 2002 2003 2004 2005 2006 append ( ) 4522 11473 20215 30658 40169 46913 getMinorComponent ( ) 20 18 34 33 20 24 getVersionStr ( ) 53 25 25 0 0 0 getModifiedElement ( ) 3 3 0 10 16 17 15
  16. 16. Method Calls Number of occurrences per year in Rhino Evolution Method name patterns 1999 2000 2001 2002 2003 2004 2005 2006 2007 getProperty ( ) 2 8 10 20 28 37 37 37 41 addByteCode ( ) 248 220 108 106 0 0 0 0 0 reportConvError ( ) 30 30 16 0 0 17 17 17 17 charAt ( ) 209 242 127 132 140 144 146 147 177 16
  17. 17. Method Calls Deletion classFile = new ClassFileWriter (generatedClassName, superClassName, itsSourceFile); ... for (int i = 0; i< scriptOrFn.getParamCount ( ); i++) { push (i); addByteCode (ByteCode.ALOAD, 4); ... } ... private void addByteCode (byte theOpcode){ classFile.add(theOpcode); } Commit message: “Renaming Codegen.classFile to Codegen.cfw and removal of methods like push/ load/store/add in favour of directly calling ClassFileMethods.” cfw = new ClassFileWriter (generatedClassName, superClassName, itsSourceFile); ... for (int i = 0; i< scriptOrFn.getParamCount ( ); i++) { cfw.addPush (i); cfw.add (ByteCode.ALOAD, 4); ... } 17
  18. 18. Evolution of Features • Variables • Import statements • Packages • Method calls • Word Stems getName = {get, name} 18
  19. 19. Word Stems Number of occurrences per year in Rhino Evolution Word stems patterns 1999 2000 2001 2002 2003 2004 2005 2006 2007 get 3626 3750 1775 1803 1640 1474 1493 1502 1647 set 488 488 259 316 303 291 292 297 318 feature 0 0 12 16 13 21 21 23 32 system 2 2 0 0 0 0 0 1 3 19
  20. 20. Learning from Evolution tokens tokens db deletion evolution program (CVS data) patterns analyzer point to patterns defect violations locations 20 Tokens (CVS data)
  21. 21. Learning from Evolution tokens tokens db deletion evolution (CVS data) patterns analyzer issue a patterns warning! violations 21 Tokens (CVS data)
  22. 22. Combining Patterns getNextSibling ( ) Combined Patterns getNext ( ) 300 300 150 225 225 100 150 150 50 75 75 0 0 0 1999 2001 2003 2005 2007 1999 2001 2003 2005 2007 1999 2001 2003 2005 2007 Rhino Rhino Detected a Substitution! 22
  23. 23. Combining Patterns token old token new token # substitutions type m call getNextSibling getNext 12 Rhino m call getShort getIndex 8 (31 in total) m call generateCodeFromNode generateExpression 8 m call reportError reportSyntaxError 9 ... m call addVariable addReslover 48 m call outputDelimiter outputIn 29 Eclipse m call gtk_new gtk_new_system 18 (1864 in total) m call getString translateString 16 ... 23
  24. 24. Combining Patterns ... ... Node lhs = n.getFirstChild(); Node lhs = n.getFirstChild(); Node rhs = lhs.getNextSibling(); Node rhs = lhs.getNext(); lookForVariablesAndCalls(rhs, liveSet, theVariables); lookForVariablesAndCalls(rhs, liveSet, theVariables); ... ... 300 150 225 100 150 75 50 0 0 1999 2002 2005 1999 2002 2005 Commit message: “ I removed method duplication in Node where getNext() was duplicated as getNextSibling() and code was using both of them and similarly for getFirstChild()/getFirst().” 24
  25. 25. Learning from Evolution tokens tokens db combined deletion evolution (CVS data) patterns analyzer recommend issue a patterns substitution warning! violations 25
  26. 26. Future tokens list Extend the Work • Variables • Import statements • Packages • Method calls • Word Stems • More features 26
  27. 27. FuturedoWork Context matter getShort( ) getIndex( ) (replaced in Rhino) ... case LINE_ICODE: { int line = getShort (iCode, pc + 1); ... } ... In 100% of the places where ‘LINE_ICODE’ was used, getShort( ) was not replaced by getIndex( ). 27
  28. 28. Future Work Program Analysis Features ... ... scriptOrFn.getParamCount ( ); addByteCode (ByteCode.ALOAD, 4); addByteCode (ByteCode.ALOAD, 4); scriptOrFn.getParamCount ( ); ... ... Evolution and trends of sequence of method calls 28
  29. 29. Planned Evaluation • Training and testing data sets: • learn evolution patterns from the training set; • predicting deletions in the testing set. • Recommendation tool, perform user studies: • open-source community; • closed-source community. 29
  30. 30. Related Work Thomas Zimmermann Beat Fluri Stephan Diehl 30
  31. 31. Summary Evolution Patterns 3,000 600 300 2,250 450 240 180 1,500 300 120 750 150 60 0 0 0 2001 2003 2005 1999 2002 2005 1999 2002 2005 20 30.0 500 15 22.5 375 10 15.0 250 5 7.5 125 0 0 0 2001 2003 2005 1999 2002 2005 1999 2002 2005 5 Learning from Evolution tokens tokens db deletion evolution program (CVS data) patterns analyzer point to patterns defect violations locations 18 Tokens 31

×