An Empirical Study Of Function Clones In Open Source Software


Published on

This a presentation on a Research paper basically they made a tool call NICAD.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

An Empirical Study Of Function Clones In Open Source Software

  1. 1. An Empirical Study of Function Clones in Open Source Software Chnchal K.Roy and James R. Cordy Queen’s University Presenter: MF Khan
  2. 2. Outline <ul><li>Introduction </li></ul><ul><li>NICAD Overview </li></ul><ul><li>Experimental Setup </li></ul><ul><li>Experimental Results </li></ul><ul><li>Conclusions </li></ul><ul><li>Discussion </li></ul>
  3. 3. Introduction <ul><li>Code Clone/Clone </li></ul><ul><ul><li>Reusing a code of fragment by copying and pasting with or without minor modifications </li></ul></ul><ul><li>Benefits </li></ul><ul><ul><li>Software Maintenance (Bug detection) </li></ul></ul><ul><li>History </li></ul><ul><ul><li>Several techniques were proposed </li></ul></ul><ul><ul><li>Lack of in depth comparative studies on cloning in Variety of systems </li></ul></ul>
  4. 4. Introduction (Cont) <ul><li>NICAD </li></ul><ul><ul><li>In depth study of function cloning in 15+ C and Java Systems including Apache and Linux kernel </li></ul></ul><ul><ul><li>Accurate Detection of Near-Miss functions Clones. </li></ul></ul><ul><ul><li>Focusing on its worth in detecting copy/Pasted near-miss clones by using pretty printing, Code normalization and filtering </li></ul></ul><ul><ul><li>Light Weight using simple text line </li></ul></ul><ul><ul><li>Capable of detecting clones in very large system in different languages </li></ul></ul>
  5. 5. NICAD Overview <ul><li>Three phases of clone detection </li></ul><ul><ul><li>Extraction </li></ul></ul><ul><ul><li>All potential clones are identified and extracted. </li></ul></ul><ul><ul><li>All function and method in C & Java with their original source coordinates </li></ul></ul><ul><ul><li>Comparison ( Determination of Clones ) </li></ul></ul><ul><ul><ul><li>Potential clones are clustered and compared. </li></ul></ul></ul><ul><ul><ul><li>Pretty printed potential clones line by line text wise using Longest common subsequence(LCS). </li></ul></ul></ul>
  6. 6. NICAD Overview <ul><ul><li>Unique Percentage of Items(UPI) </li></ul></ul><ul><ul><li>IF UPI for both line sequence is zero or below certain threshold. </li></ul></ul><ul><ul><li>Potential Clones are consider to be clone </li></ul></ul><ul><ul><li>Reporting </li></ul></ul><ul><ul><li> Results from NICAD reported in XML database form and interactive HTML </li></ul></ul>
  7. 7. Experimental Setup <ul><ul><li>Paper applied NICAD to find function clones in a number of open source systems </li></ul></ul><ul><ul><li>Later on paper introduce a set of metrics to analyze the results </li></ul></ul>
  8. 8. Experimental Setup <ul><ul><li>Subject Systems 10 C and 7 Java systems </li></ul></ul>
  9. 9. Clone Definition <ul><li>Non empty functions of at least 3 LOC </li></ul><ul><li>In Pretty printed format. </li></ul><ul><li>Different Unique Percentage of Items (UPI) use to find exact and near miss clones. </li></ul><ul><li>E.g. </li></ul><ul><ul><li>If UPI threshold is 0.0 =Exact clone </li></ul></ul><ul><ul><li>If UPI threshold is 0.10=Two function as clone </li></ul></ul>
  10. 10. Validation of Clones <ul><li>To validate detected clone is 2 step process </li></ul><ul><li>1:NICADE’s INTRACTIVE HTML OUTPUT </li></ul><ul><ul><li>To given an overall view of original source of clone classes an over view of original source of clone classes. </li></ul></ul><ul><li>2:XML OUTPUT </li></ul><ul><ul><li>To pair wise compare the original source of the functions in each clone class </li></ul></ul><ul><ul><li>using Linux diff to determine the textual similarity of the original source </li></ul></ul>
  11. 11. Metrics and Visualizations <ul><li>Total Cloned Methods(TCM) </li></ul><ul><ul><li>How to get over all cloning statistics </li></ul></ul><ul><li>File Associated with Clone(FAWC) </li></ul><ul><ul><li>Overall localization of clones. </li></ul></ul><ul><ul><li>From a s/w maintenance point of view, a lower value of FAWCP is desirable...Why? </li></ul></ul><ul><ul><li>If clone are localized to certain specific files and thus may be easier to maintain </li></ul></ul><ul><ul><li>Still one can’t say which files contain the majority of clone in the system </li></ul></ul>
  12. 12. Metrics and Visualizations <ul><li>Cloned Ratio of File for Methods(CRFM) </li></ul><ul><ul><li>With CRFM we attempt discover highly cloned files </li></ul></ul><ul><ul><li>In a particular file (f) </li></ul></ul><ul><li>Profile of Cloning Locality w.r.t Methods(PCLM) </li></ul><ul><ul><li>Kapser and Godfrey provide 3 location base function clones. </li></ul></ul><ul><ul><li>1:In the same File 2:Same DIR 3: Different DIR </li></ul></ul>
  13. 13. Experimental Results 1.More function cloning in Open Source java than in C. On AvG about 15%(7.2% wrt LOC) 2.Effect of increasing UPI is almost identical.
  14. 14. Detail Overview 1.Several of C system have <10% cloning function. Java systems are consistent in cloning
  15. 15. Clone Associated Files
  16. 16. Clone Associated Files <ul><li>FAWC address the issue of what portion of the files in a system is associated with clone. </li></ul><ul><li>A system with more clones but with associated with only a few files is in some sense better than a system with fewer clones scattered over many files from a software maintenance point of view. </li></ul>
  17. 17. Profiles of Cloning Density <ul><li>It tell us which files are highly cloned or which files contain the majority of clones </li></ul>That’s mean Scattered File and more near miss clones
  18. 18. Profile of cloning Density Assuming that cloned method in high density cloned file have been intentionally copy/Pasted.
  19. 19. Profile Cloning Localization Location of a clone pair is a factor in s/w maintenance Except Linux there are no exact clone in (UPI threshold 0.0) in C When UPI threshold is 0.3,On average 45.9 %(49.0 % LOC) of clone pair in C Occur.
  20. 20. Conclusion <ul><li>NICAD is capable of accurately finding the </li></ul><ul><li>Exact Function Clone </li></ul><ul><li>Near Miss Function Clones </li></ul>
  21. 21. Discussion <ul><li>What is definition of Clone? </li></ul><ul><li>What is definition of near-miss clones? </li></ul><ul><li>Why Wel tab is higher in slide 14? </li></ul><ul><li>What if we use C++ or C#? </li></ul><ul><li>What will happen if we use smaller clone granularity such as begin- end block </li></ul>
  22. 22. Thank you.