Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Accurate and Efficient Refactoring Detection in Commit History

169 views

Published on

Refactoring detection algorithms have been crucial to a variety of applications: (i) empirical studies about the evolution of code, tests, and faults, (ii) tools for library API migration, (iii) improving the comprehension of changes and code reviews, etc. However, recent research has questioned the accuracy of the state-of-the-art refactoring detection tools, which poses threats to the reliability of their application. Moreover, previous refactoring detection tools are very sensitive to user-provided similarity thresholds, which further reduces their practical accuracy. In addition, their requirement to build the project versions/revisions under analysis makes them inapplicable in many real-world scenarios. To reinvigorate a previously fruitful line of research that has stifled, we designed, implemented, and evaluated RefactoringMiner, a technique that overcomes the above limitations. At the heart of RefactoringMiner is an AST-based statement matching algorithm that determines refactoring candidates without requiring user-defined thresholds. To empirically evaluate RefactoringMiner, we created the most comprehensive oracle to date that uses triangulation to create a dataset with considerably reduced bias, representing 3,188 refactorings from 185 open-source projects. Using this oracle, we found that RefactoringMiner has a precision of 98% and recall of 87%, which is a significant improvement over the previous state-of-the-art.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Accurate and Efficient Refactoring Detection in Commit History

  1. 1. Accurate and Efficient Refactoring Detection in Commit History 1May 31, 2018 Danny DigNikolaos Tsantalis Matin Mansouri Laleh Eshkevari Davood Mazinanian
  2. 2. Refactoring is noise in evolution analysis • Bug-inducing analysis (SZZ): flag refactoring edits as bug-introducing changes • Tracing requirements to code: miss traceability links due to refactoring • Regression testing: unnecessary execution of tests for refactored code with no behavioral changes • Code review/merging: refactoring edits tangled with the actual changes intended by developers 2
  3. 3. There are many refactoring detection tools • Demeyer et al. [OOPSLA’00] • UMLDiff + JDevAn [Xing & Stroulia ASE’05] • RefactoringCrawler [Dig et al. ECOOP’06] • Weißgerber and Diehl [ASE’06] • Ref-Finder [Kim et al. ICSM’10, FSE’10] • RefDiff [Silva & Valente, MSR’17] 3
  4. 4. Limitations of previous approaches • Dependence on similarity thresholds • thresholds need calibration for projects with different characteristics • Dependence on built versions • only 38% of the change history can be successfully compiled [Tufano et al., 2017] • Unreliable oracles for evaluating precision/recall • Incomplete (refactorings found in release notes or commit messages) • Biased (applying a single tool with two different similarity thresholds) • Artificial (seeded refactorings) 4
  5. 5. Why do we need better accuracy? 5 Empirical studies Refactoring detection Library adaptation Framework migration poor accuracy
  6. 6. Why do we need better accuracy? 6
  7. 7. Contributions 1. First refactoring detection algorithm operating without any code similarity thresholds 2. RefactoringMiner open-source tool with an API 3. Oracle comprising 3,188 refactorings found in 538 commits from 185 open-source projects 4. Evaluation of precision/recall and comparison with previous state-of-the-art 5. Tool infrastructure for comparing multiple refactoring detection tools 7
  8. 8. Approach in a nutshell AST-based statement matching algorithm • Input: code fragments T1 from parent commit and T2 from child commit • Output: • M set of matched statement pairs • UT1 set of unmatched statements from T1 • UT2 set of unmatched statements from T2 • Code changes due to refactoring mechanics: abstraction, argumentization • Code changes due to overlapping refactorings or bug fixes: syntax-aware AST node replacements 8
  9. 9. 9 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } AfterBefore
  10. 10. 10 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(int count) { List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } AfterBefore
  11. 11. 11 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } AfterBefore private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; }
  12. 12. 12 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } AfterBefore private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; }
  13. 13. 13 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { } return addresses; } try { addresses[i] = new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } AfterBefore
  14. 14. 14 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore
  15. 15. 15 protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } AfterBefore textual similarity  30%
  16. 16. 16 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (1) Abstraction
  17. 17. 17 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (1) Abstraction
  18. 18. 18 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (2) Argumentization
  19. 19. 19 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (2) Argumentization
  20. 20. 20 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (3) AST Node Replacements
  21. 21. 21 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (3) AST Node Replacements
  22. 22. 22 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore textual similarity = 100%
  23. 23. 23 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore A B C D E F G 1 2 3 9 4 5 6 7 8 M = {(C, 4) (D, 5) (E, 6) (F, 7)} UT1 = {A, B, G} UT2 = {8}
  24. 24. Extract Method detection rule (M, UT1, UT2) = statement-matching(createAddresses, createAddress) M = {(C, 4) (D, 5) (E, 6) (F, 7)} UT1 ={A, B, G} UT2 = {8} createAddress is a newly added method in child commit  createAddresses in parent commit does not call createAddress  createAddresses in child commit calls createAddress  |M| > |UT2|   createAddress has been extracted from createAddresses 24
  25. 25. Evaluation RQ1: What is the accuracy of RefactoringMiner and how does it compare to the state-of-the-art? RQ2: What is the execution time of RefactoringMiner and how does it compare to the state-of-the-art? 25
  26. 26. Oracle construction • Public dataset with validated refactoring instances: 538 commits from 185 open-source projects [Silva et al., FSE’2016 Distinguished Artifact] • We executed two tools: RefactoringMiner and RefDiff [Silva & Valente, MSR’2017] • We manually validated all 4,108 detected instances with 3 validators for a period of 3 months • 3,188 true positives and 920 false positives 26
  27. 27. Comparison with state-of-the-art RefDiff [Silva & Valente, MSR’2017] • Commit-based refactoring detection tool • Evaluation on 448 seeded refactorings in 20 open-source projects • RefDiff has much higher precision/recall than Ref-Finder [Kim et al. 2010] and RefactoringCrawler [Dig et al. 2006] • Ref-Finder and RefactoringCrawler need fully built Eclipse projects as input 27
  28. 28. 28 • RMiner better precision in all refactoring types • In half of the types RefDiff has better recall • Overall RMiner has +22% precision and +1.5% recall
  29. 29. Advantage of RefDiff • Treats code fragments as bags of tokens and ignores the structure 29 - private void startScanner() throws Exception + public void startScanner() throws Exception { { - // check if scanning is enabled + if (!isScanningEnabled()) - if (scanIntervalSeconds <= 0) return; - if ( "manual".equalsIgnoreCase( reload ) ) return; return; + public boolean isScanningEnabled () + { + if (scanIntervalSeconds <=0 || "manual".equalsIgnoreCase( reload )) + return false; + return true; + }
  30. 30. Disadvantage of RefDiff • Inability to deal with changes in the tokens 30 - if (eventBus != null) { + onRemoteStatusChanged(lastRemoteInstanceStatus, currentRemoteInstanceStatus); - StatusChangeEvent event = new StatusChangeEvent(lastRemoteInstanceStatus, - currentRemoteInstanceStatus); - eventBus.publish(event); - } + protected void onRemoteStatusChanged(InstanceInfo.InstanceStatus oldStatus, InstanceInfo.InstanceStatus newStatus) { + if (eventBus != null) { + StatusChangeEvent event = new StatusChangeEvent(oldStatus, newStatus); + eventBus.publish(event); + } + }
  31. 31. Execution time per commit [ms] • On median, RefactoringMiner is 7 times faster than RefDiff 31 ms
  32. 32. Limitations + Future work • Missing context: Pull Up reported as Move, if a class between the source and destination is unchanged. • Nested refactorings: unable to detect Extract Method applied within an extracted method • Unsupported refactorings: refactoring types, such as Rename Variable/Parameter/Field, Extract/Inline Variable can be supported from the analysis of AST replacements. • Oracle bias: plan to add more tools for constructing the oracle (challenge: make tools work without binding information) 32
  33. 33. Conclusions • RefactoringMiner: commit-based refactoring detection • No similarity thresholds • High accuracy: 98% precision, 87% recall • Ultra-fast: 58ms on median per commit • Better than competitive tools (RefDiff): +22% precision, 7 times faster • Largest and least biased refactoring oracle up to date • 3188 true refactoring instances • 538 commits • 185 open-source projects • 3 validators over 3 months (9 person-months) 33 http://refactoring.encs.concordia.ca/oracle/ https://github.com/tsantalis/RefactoringMiner

×