Session chair introduction – &quot;Patricia Jablonski is a PhD student in Software Engineering at Clarkson University in Potsdam, New York. She will be presenting on the design of a proactive clone management environment based on experience with a prototype named CnP. The paper is titled, 'Exploring the Design Space of Proactive Tool Support for Copy-and-Paste Programming'&quot;.
Clone detection & removal – to find clones in existing source code and replace the similar code fragments with an abstraction (i.e. function, method, procedure). Clone management – support the evolution of clones. CnP – a collection of tools and features to support the copy-and-paste programming practice.
This picture shows the clone life cycle and possible tool support in each stage. Areas where CnP has support are shown in bold. (CnP does not currently have support for inter-clone editing or refactoring).
The clone model is the basis of a proactive clone management (PCM) environment. Clone model – how individual clones and their relationship are represented (clone locations, cloning relationships). Clone locations – line ranges (imprecise clone boundary, i.e. a single line may contain multiple statements) vs. character offset and length (AST node) in a file. * Design Decision: Clones should be represented at the granularity of a character. Cloning relationships – knowledge of the clone origin vs. symmetric relationship (clone group). * Design Decision: The symmetric cloning relationship between a pair of clones should be supported (with clone origin information as optional). Other PCM environment requirements: - A PCM environment must accurately maintain the clone locations when code is added or deleted before or within clone regions. - The clone model should be persisted (saved/loaded) between IDE sessions. - A PCM environment’s clone model should cover the whole workspace, not just individual projects. - The clone model should be managed by version control systems so that it can be shared between team members.
Clone capture - Track the copy and paste actions as they happen in the IDE. (proactive) --- Copy-and-paste-induced clones are known upon creation with 100% accuracy and captures ephemeral (short-lived) clones that may disappear from the code base before clone detection tools are applied. - Import clones from clone detection tools. (retroactive) --- Captures clones in legacy or existing source code that was developed before PCM was applied and captures non-copied-and-paste-induced clones. * Goal: to capture and support the evolution of all copy-and-paste-induced clones (retroactive + proactive). * Design Problem: Only want to track meaningful/relevant/significant clones, not just any copy and paste. - CnP uses a configurable policy of (1) more than two statements, (2) at least one conditional statement, loop statement, or method, or (3) a type definition (class or interface). * Design Problem: Allow for the removal and the merging of clones. - How can merging of clones be supported by tools?
Clone visualization - The tracking of clones happens behind-the-scenes, so we then have to decide on how to show this information to the programmer. * Design Problem: Need to effectively display the clone’s code and navigate around a clone group. - CnP shows individual clones visually by displaying colored bars next to the clone’s source code in the editor (but this may clutter up the editor, be distracting, etc). - CnP has two other clone visualization features: diff view (CSeR) and context interaction view (warnings that are given when the pasted code includes externally declared identifiers).
CSeR (Code Segment Reuse) - Displays detailed source code commonalities and differences for more accurate code comparison. - Helps programmers better understand the clones, which could help programmers keep consistent changes between them and maintain the correspondence relationship (the necessary level of similarity that must be maintained between clones). - The clones are identical when initially pasted, with incremental changes highlighted (any unchanged code within a clone is not highlighted). - Inserts (the addition of an AST node) are highlighted in green, deletes (the removal of an AST node) are highlighted in red, updates (the modification of an existing AST node) are highlighted in yellow, and moves (the matching statements have different neighbors) are highlighted in blue. - Extra information is shown for deletes and updates when the mouse is hovered over the highlighted text (what has been deleted from the original, what the updated code was before). * Design Problem: Need to convert ‘detected clones’ (from clone detection tools) into a format that can be managed proactively. - CSeR currently cannot be applied to imported clones, since it would need to establish correspondences between already modified clones (it currently relies on the clones being identical at first and then incrementally modified).
Context Interaction View (Warnings about accidental identifier capture within a clone) - Displays warnings that are given when the pasted code includes externally declared identifiers. These warnings alert the programmer that these particular identifier instances within the clone may need to be renamed. - In this example, the method “more_variables” was copied and pasted (the blue bar shown along the left side shows the programmer that this is a clone). This warning feature of CnP shows the programmer that the fields “v_count”, “variables”, and “STORE_INCR” should be renamed within this clone (there is a yellow, exclamation icon with hover information). Other examples of context interaction views: - Infer commonalities between and within clones (common lexical patterns – left/right). - Alert about ‘unusual’ facts or relationships (inconsistent data and control flow between clones and their contexts).
Clone editing - Inter-clone editing (support for consistent edits between clones) – not currently supported in CnP (other research does ‘simultaneous editing’: LAPIS, Codelink, CloneTracker). - Intra-clone editing (support for consistent edits within a clone) – consistent renaming of identifiers (CReN). - Clone refactoring and removing a clone group (stop tracking a whole group of clones) – not currently supported in CnP (other research does ‘clone detection & removal’).
CReN (Consistent Renaming of Identifiers) - Renames identifier instances together consistently within clones when the programmer renames one of those instances. - In addition to ‘clone tracking’, CReN does ‘identifier tracking’ (CReN groups together identifier instances that bind to the same program element or the same name). - CReN helps the programmer’s coding efficiency and helps prevent inconsistent renaming errors that can happen when a programmer renames manually (they can miss renaming one of the instances, for example, which is not detected by the compiler since it is still in scope – as in the warning of external identifier capture example). - Find Range example: the code for finding the lowest integer in the array of integers is already written, copy and paste it, and modify the pasted code’s operator to greater than, all instances of “low” to “high” and all instances of “i” to “j”. This shows CReN renaming all instances of “i” to “j” in the pasted code when one of the instances is being modified. - Switch “i” and “j” between nested for loops (declaration of “i” and “j” are in the clone) example: Change all “i”’s to “j”’s, then change all “j”’s to “i”’s. Rename refactoring in Eclipse changes all “j”’s to “i”’s (including those in the other for loop that used to be “i”’s). CReN, on the other hand, does this switching correctly due to the binding information. * Goal: provide sufficient interaction so that programmers can still be in control, yet minimize the frequency where programmers are forced to correct the tool’s mistakes. - CReN assumes to rename all instances of the same identifier together, but it does allow the programmer to remove an identifier instance from the group of identifier instances to be renamed together (user control).
Clone divergence - Clones are modified to the point that they are not similar anymore. - Clones can be removed due to refactoring into an abstraction or other reasons. - After this point there is no remaining clone to be managed, but history of the clone can be recorded and the abstraction managed.
- We ran the clone detection tools CCFinder and/or SimScan on SCL (code developed by one of the authors) and the Eclipse JDT UI source code to find potential clones. - We tried to determine clones that were likely to have been copied and pasted (indicated by same special comments or code within close proximity). - One question that we wanted to answer is: To what extent is proactive clone management needed? (both results showed that PCM is needed). Clone detection tools: - We have also determined as a result of the study that proactive clone management needs the ability to import detected clones, since the output of clone detection tools can contain clone information that is useful to the programmer as well. - But, we have confirmed that clone detection tools alone are inadequate for clone management – they make the programmer aware of the existence of clones, but are limited in their ability to detect “gapped clones” (clones that are not identical, but which were modified and therefore are only similar). For example, clone detection tools only detect smaller code fragments as clones of whole classes that are cloned and modified, and don’t detect the whole classes as clones. Proactive support would detect the entire classes as clones.
- We identified the correspondences and changes between class clones in the Eclipse JDT source code and counted the number of changes made to convert one class to a clone class. For example, one of the classes that we looked at needed 33 changes to be made into its clone class. - We found at least 5 clone groups with identifier name changes as the difference between clones. - The tedious comparisons could have been avoided with CSeR. - The identifiers could have been renamed consistently with CReN.
- The second question that we wanted to answer is: How can proactive clone management be designed better? - We had to switch back and forth between multiple Java files or locations just to identify a small difference to report (need a side-by-side view for comparing multiple clones). - We found a case where bigger clones contained smaller clones (need support for ‘clone-overlapping’). - We found cases where whole classes have similar code structure and vary in fixed locations in a predictable way, i.e. the 4 arithmetic operations (need a script for modifying new clones based on past changes). - We found clones with ‘symmetrical code patterns’ like “getPreviousPosition()” and “getNextPosition()” (need support for these kinds of patterns). - We found some clones that only differed in a pair of types or pair of expressions, i.e. only a literal string difference between subclasses (need to show these very small differences to the programmer, so that they are noticed).
Conclusion - PCM tool support is needed throughout the clone life cycle as the clone evolves. - Certain design elements are needed, and others desired, in a PCM environment.
Exploring the Design Space of Proactive Tool Support for Copy-and-Paste Programming Daqing Hou, Ferosh Jacob, and Patricia Jablonski Clarkson University CASCON 2009 November 5, 2009
Code Clone Management <ul><li>Code clones are similar code fragments </li></ul><ul><ul><li>Often a result of copying, pasting, & modifying </li></ul></ul><ul><li>Clone detection & removal (retroactive) </li></ul><ul><ul><li>But clones also need to be managed </li></ul></ul><ul><li>Proactive clone management </li></ul><ul><ul><li>Provide programmers with tools to support & manage clones throughout their entire life cycle (from creation to extinction) </li></ul></ul><ul><ul><li>We made a PCM Eclipse plug-in named CnP </li></ul></ul>
Case Studies of Clones <ul><li>Study the clones in real-world systems </li></ul><ul><ul><li>Used CCFinder and SimScan clone detection tools on SCL and Eclipse JDT UI source code </li></ul></ul><ul><li>1). To what extent is PCM needed? </li></ul><ul><ul><li>For SCL, 50/70 of the intentional, useful clone groups could have been supported by PCM </li></ul></ul><ul><ul><li>For JDT UI, PCM is useful and thus needed </li></ul></ul><ul><ul><li>PCM has to be able to import detected clones, but clone detection tools are not adequate for clone management </li></ul></ul>
Case Studies of Clones <ul><li>Study the clones in real-world systems </li></ul><ul><ul><li>Identified correspondences and changes between class clones (ex. 33 changes) </li></ul></ul><ul><ul><li>Found cases of renamed identifiers in clones </li></ul></ul><ul><li>1). To what extent is PCM needed? </li></ul><ul><ul><li>Tedious comparisons could have been avoided with CSeR </li></ul></ul><ul><ul><li>The identifiers could have been renamed consistently with CReN </li></ul></ul>
Case Studies of Clones <ul><li>2). How can PCM be best designed? </li></ul><ul><ul><li>Side-by-side comparison of multiple clones </li></ul></ul><ul><ul><li>Support for “clone-overlapping” </li></ul></ul><ul><ul><li>Script for modifying new clones based on past changes </li></ul></ul><ul><ul><li>Support for “symmetrical code patterns” </li></ul></ul><ul><ul><li>Make clones with a small number of paired differences (type or expression) visible to the programmer </li></ul></ul>
Conclusion <ul><li>PCM tool support is needed throughout the clone life cycle as the clone evolves </li></ul><ul><li>Certain design elements are needed, and others desired, in a PCM environment </li></ul>