PhD Proposal


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

PhD Proposal

  1. 1. Techniques for Detecting and Preventing Copy-and-Paste Errors during Software Development A Dissertation Proposal By Patricia Jablonski Engineering Science Clarkson University September 5, 2007
  2. 2. Outline <ul><li>Copying and pasting code </li></ul><ul><li>Modifying copy-and-pasted code </li></ul><ul><li>Our proposed solution (CnP) </li></ul><ul><li>Our proof of concept (CReN) </li></ul><ul><li>Demo of CReN </li></ul><ul><li>Related Eclipse features </li></ul><ul><li>Evaluation plan </li></ul><ul><li>Proposed plan </li></ul>
  3. 3. Copying and Pasting Code <ul><li>A common form of software reuse </li></ul><ul><ul><li>Reuse copied code as a template </li></ul></ul><ul><li>Why copy and paste code? </li></ul><ul><ul><li>Duplicate code exactly </li></ul></ul><ul><ul><li>Defer creating an abstraction </li></ul></ul><ul><ul><li>Experiment and test </li></ul></ul><ul><li>Results in code clones </li></ul><ul><ul><li>Multiple similar code fragments </li></ul></ul><ul><li>What happens when code needs modification? </li></ul>
  4. 4. Modifying Copy-and-Pasted Code (1 of 2) <ul><li>Expensive software maintenance </li></ul><ul><ul><li>Original copied code could be erroneous </li></ul></ul><ul><ul><li>Changes need to be made to each instance </li></ul></ul><ul><li>Solutions: clone detection and removal, clone tracking tools </li></ul><ul><ul><li>Linked editing and simultaneous editing </li></ul></ul><ul><ul><ul><li>Clones are selected and linked together so that modifications in one clone can be made to all of the clones that it is linked to simultaneously </li></ul></ul></ul>
  5. 5. Modifying Copy-and-Pasted Code (2 of 2) <ul><li>Manual modifications can result in undetected errors and unintended inconsistencies </li></ul><ul><li>Solution: error detection tools </li></ul><ul><ul><li>CP-Miner tool </li></ul></ul><ul><ul><ul><li>Uses identifier mapping, “forget-to-change” vs. “change”, and unchanged ratio </li></ul></ul></ul><ul><ul><li>DECKARD-based tool </li></ul></ul><ul><ul><ul><li>Uses a count of unique identifiers </li></ul></ul></ul><ul><li>What about proactive error prevention? </li></ul>
  6. 6. Our Proposed Solution (CnP) <ul><li>Provide automated tool support in the IDE </li></ul><ul><ul><li>Eclipse, Java </li></ul></ul><ul><li>Improve software quality during development </li></ul><ul><li>What are the main features of the CnP tool? </li></ul><ul><ul><li>Tracks & highlights copy-pasted statements </li></ul></ul><ul><ul><li>Detects inconsistencies based on inferences of the programmer’s intention </li></ul></ul><ul><ul><ul><li>Inconsistencies are based on inferred rules </li></ul></ul></ul><ul><li>What is the current status of CnP? </li></ul>
  7. 7. Our Proof of Concept (CReN) Design and Implementation (1 of 5) <ul><li>Consistent renaming usage pattern </li></ul><ul><ul><li>Identifier (for example, variable name) renaming within a copy-and-paste clone </li></ul></ul><ul><li>Manual renaming can result in inconsistencies </li></ul><ul><li>What are the main features of the CReN tool? </li></ul><ul><ul><li>Tracks & highlights copy-pasted statements </li></ul></ul><ul><ul><li>Automatically renames all instances of an identifier in a group when any one instance in the group is modified, the inferred rules can be refined by the programmer </li></ul></ul>
  8. 9. Our Proof of Concept (CReN) Design and Implementation (2 of 5) <ul><li>Tracking copy-and-paste clones </li></ul><ul><ul><li>No clone detection tool or manual selection </li></ul></ul><ul><ul><li>Clone region: Java file name + clone’s range </li></ul></ul><ul><li>Obtaining ASTs from clone locations </li></ul><ul><ul><li>Abstract syntax tree (AST) API in Eclipse </li></ul></ul><ul><ul><li>AST captures the source code characters & their absolute position in the source code </li></ul></ul><ul><ul><li>Each ASTNode has starting/ending positions denoting character positions within the node </li></ul></ul>
  9. 11. Our Proof of Concept (CReN) Design and Implementation (3 of 5) <ul><li>Matching identifiers between clones </li></ul><ul><ul><li>Determine relationships of identifiers between copy-and-pasted code fragments </li></ul></ul><ul><ul><li>Identifiers in the copied code are matched with their corresponding identifiers in the pasted code </li></ul></ul><ul><ul><li>When the code has just been pasted, its contents are identical to the copied fragment, only at a different location </li></ul></ul><ul><ul><li>Rules are inferred across all clones </li></ul></ul>
  10. 13. Our Proof of Concept (CReN) Design and Implementation (4 of 5) <ul><li>Partitioning identifiers into groups </li></ul><ul><ul><li>Determine relationships of identifiers within copy-and-pasted code fragments </li></ul></ul><ul><ul><li>Identifiers in the copied and pasted code are partitioned into groups and mapped to each other </li></ul></ul><ul><ul><li>Defines the group of identifiers that are to be renamed together </li></ul></ul><ul><ul><li>Want group of identifiers that resolve to the same variable – use binding, if available </li></ul></ul>
  11. 15. Our Proof of Concept (CReN) Design and Implementation (5 of 5) <ul><li>Refining the inferred rules </li></ul><ul><ul><li>When the code is initially pasted, the inferred rule assumes that all identifiers that would resolve to the same program entity should be renamed consistently </li></ul></ul><ul><ul><li>Programmer can choose to exclude the currently renamed identifier from the group (this instance is deleted from the vector) </li></ul></ul><ul><ul><li>The updated rule is inferred across all clones </li></ul></ul><ul><li>Let’s see if CReN can detect/prevent errors... </li></ul>
  12. 16. Our Proof of Concept (CReN) Usage and Demonstration <ul><li>Three examples from literature show an inconsistent renaming of identifiers within a copy-and-pasted clone in production code </li></ul><ul><li>Z. Li, S. Lu, S. Myagmar, and Y. Zhou, “CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code”, USENIX-ACM SIGOPS Symposium on Operating Systems Design and Implementation (OSDI) , 2004. </li></ul><ul><li>B. Liblit, A. Aiken, A.X. Zheng, and M.I. Jordan, “Bug Isolation via Remote Program Sampling”, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) , 2003. </li></ul><ul><li>L. Jiang, Z. Su, and E. Chiu, “Context-Based Detection of Clone-Related Bugs”, European Software Engineering Conference (ESEC) and ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE) , 2007. </li></ul>
  13. 19. Demo of CReN <ul><li>Demonstrate how CReN would catch each identifier renaming error in the examples as if they were currently being written </li></ul><ul><li>(Some) CReN future work </li></ul><ul><ul><li>Consistent renaming of any kind of identifier </li></ul></ul><ul><ul><li>Allow “undo” of taking identifier out of group </li></ul></ul><ul><ul><li>Consistent renaming in a user-defined scope </li></ul></ul><ul><ul><li>Apply renaming across all related clones </li></ul></ul><ul><li>How are other Eclipse features related to CReN? </li></ul>
  14. 20. Related Eclipse Features <ul><li>Find & Replace </li></ul><ul><ul><li>Text-based search, manually started </li></ul></ul><ul><ul><li>Not limited to within a code fragment </li></ul></ul><ul><li>Rename Refactoring </li></ul><ul><ul><li>Automatically applies to the whole project </li></ul></ul><ul><ul><li>Binding is important for it to work </li></ul></ul><ul><li>Linked Renaming </li></ul><ul><ul><li>Like Rename Refactoring, but applies to file </li></ul></ul><ul><li>What are our next steps in our research? </li></ul>
  15. 21. Evaluation Plan <ul><li>We tested CReN with the three examples </li></ul><ul><li>We plan to perform controlled experiments </li></ul><ul><ul><li>Give a homework assignment to students </li></ul></ul><ul><ul><li>Require them to use Eclipse & CnP plug-in </li></ul></ul><ul><ul><li>Have them write a suitable application </li></ul></ul><ul><li>We plan to evaluate in terms of: </li></ul><ul><ul><li>Usefulness, usability (user error), user experience, accuracy (false negatives & false positives), performance </li></ul></ul><ul><li>What is our plan after CReN is fully evaluated? </li></ul>
  16. 22. Proposed Plan <ul><li>Determine usage patterns by using clone detection tools </li></ul><ul><li>What other kinds of errors could CnP handle? </li></ul><ul><ul><li>Lexical/naming pattern inconsistencies </li></ul></ul><ul><ul><ul><li>Substring is the same on both sides of = </li></ul></ul></ul><ul><ul><ul><li>Naming pairs like left/right, top/bottom </li></ul></ul></ul><ul><ul><li>Type inconsistencies </li></ul></ul><ul><ul><ul><li>Inferences can be made about types at the same positions across clones </li></ul></ul></ul><ul><li>Improve the mgmt and visualization of clones </li></ul>
  17. 23. Conclusion <ul><li>Copy-and-paste will remain a common programming practice, which can result in undetected errors </li></ul><ul><li>Error detection and prevention should happen during software development, not only “after-the-fact” </li></ul><ul><li>So far, we have implemented one of three parts of the proposed CnP tool, called CReN </li></ul><ul><ul><li>Automatic tracking of copy-and-paste clones </li></ul></ul><ul><ul><li>Consistent renaming of identifiers within copy-and-paste clones </li></ul></ul>
  18. 24. Questions / Comments
  19. 25. Extra Slides (CReN Demo Screen Shots)