SlideShare a Scribd company logo
1 of 89
Refactoring
Mining
The key to unlock
software evolution
Nikolaos Tsantalis
Concordia University
Thank you
IWoR
Who am I
RefactoringMiner
Refactoring
Miner
Alexander
Chatzigeorgiou
Eleni
Stroulia
Zhenchang
Xing
Fabio
Rocha
CASCON'13
version 0.0
• Ideas inspired from UMLDiff
• Structure of method bodies is ignored
• Method body includes only method calls and
field accesses
• Most refactorings detected based on
signature matching
• Only precision is provided
• Evaluation included only 3 systems
Refactoring motivations
Why did I stop
working on
this research?
Ref-Finder (ICSM 2010): "The precision
and recall on open source projects
were 0.74 and 0.96 respectively."
"Since these programs did not document
refactorings, we created a set of correct
refactorings by running REF-FINDER with a
similarity threshold (σ=0.65) and manually
verified them. We then measured a recall
by comparing this set with the results
found using a higher threshold (σ=0.85)"
Lesson #1
Don't be intimidated by super
results
Always question the results
Until one day... Danilo Silva Marco Tulio
Valente
Where was the code?
Lesson #2
Always make your code & data
available in a repository
You never know what it can
enable in the future
Open Science initiatives
M. T. Baldassarre, N. Ernst, B. Hermann, T. Menzies, R. Yedida
Mandatory
Data Availability
field
Let's do an
empirical
study about
refactoring
We must study
Why We Refactor
ICSE'16
version 0.1
• Danilo developed the API of RefactoringMiner
• Tooling for checking out and parsing Git commits
• Infrastructure for monitoring GitHub projects
• Automatic generation of emails to contact developers
• A web app for thematic analysis
Firehouse interview
• Monitored 124 GitHub projects between June 8th and August 7th, 2015
• Sent 465 emails and received 195 responses (42%)
• +27 commits with a description explaining the reasons
• Compiled a catalogue of 44 distinct motivations for 12
well-known refactoring types
Motivation
Catalogue
Artifact
https://github.com/aserg-
ufmg/why-we-refactor
ICSE'16 rejection
Reviewer #1: "A major threat to the research is not discussed or
considered, that RefFinder has poor recall (0.24 [31]). The authors did a
good job of combating the low-precision by manually inspecting
results, the low recall is not discussed or dealt with."
Lesson #3
Even ICSE reviewers make
mistakes
Don't get disappointed.
Improve, advance, re-submit
FSE'16 re-submission
• Renamed the tool from RefDetectorto RefactoringMiner
• Addressed most of reviewers' comments
RefDiff MSR'17
RefactoringMiner ...
The
path
of
Virtue
or
Vice
Rebirth
ICSE’18
version 1.0
Limitations of previous approaches
1. Dependence on similarity thresholds
• thresholds need calibration for projects with different characteristics
2. Dependence on built versions
• only 38% of the change history can be successfully compiled [Tufano et al., 2017]
3. Unreliable oracles for evaluating precision/recall
• Incomplete (refactorings found in release notes or commit messages)
• Biased (applying a single tool with two different similarity thresholds)
• Artificial (seeded refactorings)
Thresholds are banned
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
After
Before
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(int count) {
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
After
Before
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
After
Before
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", ports.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
After
Before
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", ports.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
}
return addresses;
}
try {
addresses[i] =
new Address("127.0.0.1", ports.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
After
Before
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address(host, port);
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
After
Before
protected static Address createAddress(String host, int port) {
try {
return new Address(host, port);
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
After
Before
textual similarity  30%
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address(host, port);
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
After
Before
(1) Abstraction
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address(host, port);
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
After
Before
(1) Abstraction
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address(host, port);
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
After
Before
(2) Argumentization
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address("127.0.0.1", ports.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
After
Before
(2) Argumentization
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address("127.0.0.1", ports.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
After
Before
(3) AST Node Replacements
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
After
Before
(3) AST Node Replacements
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
After
Before
textual similarity = 100%
Reliable oracle
• We used the “Why We Refactor” dataset
(538 commits from 185 open source projects)
• We executed both available tools (RefactoringMiner and RefDiff)
• Converted the output of the tools to the same format
• We manually validated all refactoring instances (4,108 unique instances,
out of which 3,188 were true positives and 920 were false positives)
• The validation process was labor-intensive and involved 3 validators for
a period of 3 months (i.e., 9 person-months)
• To compute recall, we considered the union of the true positives
reported by both tools as the ground truth.
Matin Mansouri Laleh Eshkevari Davood Mazinanian
ICSE’18 retrospective
• Our solution was still far from perfect
• Time pressure
• It was urgent to establish RefactoringMiner with a publication
• ICSE deadline: August 25, 2017
• Sophia’s birth: August 15, 2017
• Despite working over 1 year on this paper, there was still space for
improvement
• The entire team graduated after this work
Wisdom
TSE’20
version 2.0
• Added more replacement types
• 20 new sub-method level refactoring types
detected based on AST node replacements
• Nested refactoring detection
• Refactoring inference
• Improved the matching of method calls to
method declarations with argument type
inference
Even more reliable oracle
• We executed all available tools
• RefactoringMiner 1.0 and 2.0
• RefDiff 0.1.1, 1.0, 2.0
• GumTreeDiff 2.1.2
• Converted the output of the tools to the same format
• We validated 5,830 new unique refactoring instances, out of which
4,038 were true positives and 1,792 were false positives.
• 7,226 true positives in total for 40 different refactoring types
(72% of true instances are detected by two or more tools)
Ameya Ketkar
Emperor
version 2.2
From loop and if to
Stream API .forEach() and .filter()
Current status
• 85 supported refactoring types
• 10,900 validated true positives in the oracle
• Precision: 99.7%
• Recall: 97%
• Most tested and reliable version (200K commits without exception)
• Independent studies confirm it has the best precision
Lesson #4
What it takes to make a reliable & usable tool
Minimum 5 years of research and development
Extensive testing
Detailed documentation (README with API code snippets)
Supporting users (200+ issues resolved)
Stable project leader
Great team
The
promises
Promise #1
“researchers can replicate existing empirical studies
and refute or confirm previously-held beliefs.”
Threats to validity
• Both studies relied on Ref-Finder
• Independent studies revealed that Ref-Finder had low precision/recall
• 35% precision, 24% recall [Soares et al. JSS 2013]
• 27% precision [Kádár et al. PROMISE 2016]
• Ref-Finder paper claimed 74% precision, 96% recall
• Collected refactoring info at release level (between two versions)
• Coarse-grained analysis led to strong and unsafe assumptions
Lesson #5
A tool comes with a huge
responsibility for its authors
We should go above and beyond
when evaluating its accuracy
Promise #2
“reduce the noise created by refactorings, such as
file/directory renaming, and significantly improve the
accuracy of other tools.”
Lesson #6
Making existing code evolution analysis techniques
refactoring-aware can improve their accuracy
Promise #3
“our oracle of true refactorings from 538 commits across 185
projects provides an invaluable resource for validating novel
refactoring tools and for comparing existing approaches”
Promise #4
“enable online refactoring detection on partial input, when a
developer inspects a code diff to review a change, or tries
to understand code evolution selectively”
• Partially refactoring-aware
• Supports method signature changes
• Method moves
• File renames/moves
• Uses thresholds when comparing
program elements, calibrated on a
training set of 100 methods
• Mismatches methods from which a
significant part of their body has been
extracted to new methods, as it uses a
75% body similarity threshold to match
modified methods
• Ultra-fast: Less than 2 seconds to fetch
the entire change history of a method
Challenge
How can we take advantage of RefactoringMiner accuracy
in a way that is not computationally expensive?
Mehran Jodavi
Solution
Partial and incremental commit analysis based on
the location of the tracked program element
Precision +7.5%
Recall +3.7%
Precision +9.1%
Recall +5.8%
> 97% precision/recall change level
Method tracking commit level
Method tracking change level
Variable tracking
> 98% precision/recall commit level
Promise #5
“Integrating RefactoringMiner with the diff and code review
tools can raise the level of abstraction for code changes
originating from refactorings, thus helping developers better
understand the code evolution”
Diff Hunk Graph
Nodes: diff hunks
Hard links: syntactic constraints between the AST nodes
Soft links: repetitive change patterns
Refactoring links: edits in different locations originating
from the same refactoring
Cosmetic links: changes in comments & reformatting
Promise #6
“refactoring operations can be automatically documented at
commit-time to provide a more detailed description of the
applied changes in the commit message”
Mining for
Learning
File f = new File(:[v0], :[v1]) 
Path f = :[v0].resolve(:[v1])
FileOutputStream stream = new FileOutputStream(:[v0]) 
OutputStream stream = Files.newOutputStream(:[v0])

Ameya Ketkar
Rewrite rules
Mining for
Recommending
Why We Refactor++
A large-scale empirical study
• We developed detection rules for refactoring motivations
• We optimized the rules on the FSE’16 “Why We Refactor” dataset
• We validated their accuracy on the TOSEM’2020 dataset
• Precision: 98.4% Recall: 93.5%
• We collected the motivations for 346K Extract Method instances
found in 132,897 commits of 325 open-source repositories
Sadegh Aalizadeh
J. Pantiuchina, F. Zampetti, S. Scalabrino, V. Piantadosi, R. Oliveto, G. Bavota, and M. Di Penta,
"Why Developers Refactor Source Code: A Mining-based Study,“
ACM Transactions on Software Engineering and Methodology, Volume 29, Issue 4, Article 29, September 2020.
Refactoring recommendation tools for
top-5 Extract Method motivations
1. Reusable Method [NONE]
2. Remove Duplication [CeDAR (Tairas and Gray, IST’12), Creios (Hotta et al.,
CSMR’12), Rase (Meng et al., ICSE’15), JDeodorant (Tsantalis et al., TSE’15 +
ICSE’17), CRec (Yue et al., ICSME’18)]
3. Facilitate Extension [FR-Refactor (Ally S. Nyamawe et al. RE’19 + EMSE’20)]
4. Decompose for Readability [JDeodorant (Tsantalis and Chatzigeorgiou,
JSS’11), JExtract (Silva et al., ICPC’14), SEMI (Charalampidou et al., TSE’17),
GEMS (Xu et al., ISSRE’17)]
5. Alternative Signature [NONE]
Mining for
Accuracy
Making better refactoring mining tools is
a community effort
Mining competitions hosted in the Workshop
Community-created benchmarks
Hackathons for improving tools
https://github.com/tsantalis/RefactoringMiner
https://github.com/JetBrains-Research/RefactorInsight
https://github.com/JetBrains-Research/kotlinRMiner
https://github.com/JetBrains-Research/data-driven-type-migration
Oleg Smirnov Ameya Ketkar
Zarina Kurbatova
https://github.com/jodavimehran/code-tracker
Mehran Jodavi

More Related Content

Similar to Refactoring Mining - The key to unlock software evolution

C# Starter L04-Collections
C# Starter L04-CollectionsC# Starter L04-Collections
C# Starter L04-CollectionsMohammad Shaker
 
Bdd for-dso-1227123516572504-8
Bdd for-dso-1227123516572504-8Bdd for-dso-1227123516572504-8
Bdd for-dso-1227123516572504-8Frédéric Delorme
 
Parallel and Async Programming With C#
Parallel and Async Programming With C#Parallel and Async Programming With C#
Parallel and Async Programming With C#Rainer Stropek
 
An Introduction to Test Driven Development with React
An Introduction to Test Driven Development with ReactAn Introduction to Test Driven Development with React
An Introduction to Test Driven Development with ReactFITC
 
OOP program questions with answers
OOP program questions with answersOOP program questions with answers
OOP program questions with answersQuratulain Naqvi
 
program#include iostreamusing namespace std;void calculatio.pdf
program#include iostreamusing namespace std;void calculatio.pdfprogram#include iostreamusing namespace std;void calculatio.pdf
program#include iostreamusing namespace std;void calculatio.pdfinfo382133
 
Acceptance Testing With Selenium
Acceptance Testing With SeleniumAcceptance Testing With Selenium
Acceptance Testing With Seleniumelliando dias
 
Дмитрий Верескун «Синтаксический сахар C#»
Дмитрий Верескун «Синтаксический сахар C#»Дмитрий Верескун «Синтаксический сахар C#»
Дмитрий Верескун «Синтаксический сахар C#»SpbDotNet Community
 
Devoxx 2012 hibernate envers
Devoxx 2012   hibernate enversDevoxx 2012   hibernate envers
Devoxx 2012 hibernate enversRomain Linsolas
 
C# What's next? (7.x and 8.0)
C# What's next? (7.x and 8.0)C# What's next? (7.x and 8.0)
C# What's next? (7.x and 8.0)Christian Nagel
 
Very basic functional design patterns
Very basic functional design patternsVery basic functional design patterns
Very basic functional design patternsTomasz Kowal
 
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMEREVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMERAndrey Karpov
 
Lecture 5Arrays on c++ for Beginner.pptx
Lecture 5Arrays on c++ for Beginner.pptxLecture 5Arrays on c++ for Beginner.pptx
Lecture 5Arrays on c++ for Beginner.pptxarjurakibulhasanrrr7
 
Functional Programming in Swift
Functional Programming in SwiftFunctional Programming in Swift
Functional Programming in SwiftSaugat Gautam
 
Elixir in a nutshell - Fundamental Concepts
Elixir in a nutshell - Fundamental ConceptsElixir in a nutshell - Fundamental Concepts
Elixir in a nutshell - Fundamental ConceptsHéla Ben Khalfallah
 

Similar to Refactoring Mining - The key to unlock software evolution (20)

Linq
LinqLinq
Linq
 
C# Starter L04-Collections
C# Starter L04-CollectionsC# Starter L04-Collections
C# Starter L04-Collections
 
Bdd for-dso-1227123516572504-8
Bdd for-dso-1227123516572504-8Bdd for-dso-1227123516572504-8
Bdd for-dso-1227123516572504-8
 
Parallel and Async Programming With C#
Parallel and Async Programming With C#Parallel and Async Programming With C#
Parallel and Async Programming With C#
 
An Introduction to Test Driven Development with React
An Introduction to Test Driven Development with ReactAn Introduction to Test Driven Development with React
An Introduction to Test Driven Development with React
 
OOP program questions with answers
OOP program questions with answersOOP program questions with answers
OOP program questions with answers
 
program#include iostreamusing namespace std;void calculatio.pdf
program#include iostreamusing namespace std;void calculatio.pdfprogram#include iostreamusing namespace std;void calculatio.pdf
program#include iostreamusing namespace std;void calculatio.pdf
 
Acceptance Testing With Selenium
Acceptance Testing With SeleniumAcceptance Testing With Selenium
Acceptance Testing With Selenium
 
Дмитрий Верескун «Синтаксический сахар C#»
Дмитрий Верескун «Синтаксический сахар C#»Дмитрий Верескун «Синтаксический сахар C#»
Дмитрий Верескун «Синтаксический сахар C#»
 
Devoxx 2012 hibernate envers
Devoxx 2012   hibernate enversDevoxx 2012   hibernate envers
Devoxx 2012 hibernate envers
 
07. Arrays
07. Arrays07. Arrays
07. Arrays
 
C# What's next? (7.x and 8.0)
C# What's next? (7.x and 8.0)C# What's next? (7.x and 8.0)
C# What's next? (7.x and 8.0)
 
Very basic functional design patterns
Very basic functional design patternsVery basic functional design patterns
Very basic functional design patterns
 
Understanding linq
Understanding linqUnderstanding linq
Understanding linq
 
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMEREVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
 
Lecture 5Arrays on c++ for Beginner.pptx
Lecture 5Arrays on c++ for Beginner.pptxLecture 5Arrays on c++ for Beginner.pptx
Lecture 5Arrays on c++ for Beginner.pptx
 
Functional Programming in Swift
Functional Programming in SwiftFunctional Programming in Swift
Functional Programming in Swift
 
Design for Testability
Design for TestabilityDesign for Testability
Design for Testability
 
Elixir in a nutshell - Fundamental Concepts
Elixir in a nutshell - Fundamental ConceptsElixir in a nutshell - Fundamental Concepts
Elixir in a nutshell - Fundamental Concepts
 
Java Tutorial
Java Tutorial Java Tutorial
Java Tutorial
 

More from Nikolaos Tsantalis

CASCON 2023 Most Influential Paper Award Talk
CASCON 2023 Most Influential Paper Award TalkCASCON 2023 Most Influential Paper Award Talk
CASCON 2023 Most Influential Paper Award TalkNikolaos Tsantalis
 
SANER 2019 Most Influential Paper Talk
SANER 2019 Most Influential Paper TalkSANER 2019 Most Influential Paper Talk
SANER 2019 Most Influential Paper TalkNikolaos Tsantalis
 
Clone Refactoring with Lambda Expressions
Clone Refactoring with Lambda ExpressionsClone Refactoring with Lambda Expressions
Clone Refactoring with Lambda ExpressionsNikolaos Tsantalis
 
Why We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub ContributorsWhy We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub ContributorsNikolaos Tsantalis
 
Migrating cascading style sheets to preprocessors
Migrating cascading style sheets to preprocessorsMigrating cascading style sheets to preprocessors
Migrating cascading style sheets to preprocessorsNikolaos Tsantalis
 
An empirical study on the use of CSS preprocessors
An empirical study on the use of CSS preprocessorsAn empirical study on the use of CSS preprocessors
An empirical study on the use of CSS preprocessorsNikolaos Tsantalis
 
An Empirical Study on the Use of CSS Preprocessors
An Empirical Study on the Use of CSS PreprocessorsAn Empirical Study on the Use of CSS Preprocessors
An Empirical Study on the Use of CSS PreprocessorsNikolaos Tsantalis
 
Improving the Unification of Software Clones Using Tree and Graph Matching Al...
Improving the Unification of Software Clones Using Tree and Graph Matching Al...Improving the Unification of Software Clones Using Tree and Graph Matching Al...
Improving the Unification of Software Clones Using Tree and Graph Matching Al...Nikolaos Tsantalis
 
Code Smell Research: History and Future Directions
Code Smell Research: History and Future DirectionsCode Smell Research: History and Future Directions
Code Smell Research: History and Future DirectionsNikolaos Tsantalis
 
Preventive Software Maintenance: The Past, the Present, the Future
Preventive Software Maintenance: The Past, the Present, the FuturePreventive Software Maintenance: The Past, the Present, the Future
Preventive Software Maintenance: The Past, the Present, the FutureNikolaos Tsantalis
 
An Empirical Study of Duplication in Cascading Style Sheets
An Empirical Study of Duplication in Cascading Style SheetsAn Empirical Study of Duplication in Cascading Style Sheets
An Empirical Study of Duplication in Cascading Style SheetsNikolaos Tsantalis
 
Ranking Refactoring Suggestions based on Historical Volatility
Ranking Refactoring Suggestions based on Historical VolatilityRanking Refactoring Suggestions based on Historical Volatility
Ranking Refactoring Suggestions based on Historical VolatilityNikolaos Tsantalis
 
Feature Detection in Ajax-enabled Web Applications
Feature Detection in Ajax-enabled Web ApplicationsFeature Detection in Ajax-enabled Web Applications
Feature Detection in Ajax-enabled Web ApplicationsNikolaos Tsantalis
 
A Multidimensional Empirical Study on Refactoring Activity
A Multidimensional Empirical Study on Refactoring ActivityA Multidimensional Empirical Study on Refactoring Activity
A Multidimensional Empirical Study on Refactoring ActivityNikolaos Tsantalis
 
Unification and Refactoring of Clones
Unification and Refactoring of ClonesUnification and Refactoring of Clones
Unification and Refactoring of ClonesNikolaos Tsantalis
 

More from Nikolaos Tsantalis (16)

CASCON 2023 Most Influential Paper Award Talk
CASCON 2023 Most Influential Paper Award TalkCASCON 2023 Most Influential Paper Award Talk
CASCON 2023 Most Influential Paper Award Talk
 
SANER 2019 Most Influential Paper Talk
SANER 2019 Most Influential Paper TalkSANER 2019 Most Influential Paper Talk
SANER 2019 Most Influential Paper Talk
 
Clone Refactoring with Lambda Expressions
Clone Refactoring with Lambda ExpressionsClone Refactoring with Lambda Expressions
Clone Refactoring with Lambda Expressions
 
Why We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub ContributorsWhy We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub Contributors
 
Migrating cascading style sheets to preprocessors
Migrating cascading style sheets to preprocessorsMigrating cascading style sheets to preprocessors
Migrating cascading style sheets to preprocessors
 
JDeodorant: Clone Refactoring
JDeodorant: Clone RefactoringJDeodorant: Clone Refactoring
JDeodorant: Clone Refactoring
 
An empirical study on the use of CSS preprocessors
An empirical study on the use of CSS preprocessorsAn empirical study on the use of CSS preprocessors
An empirical study on the use of CSS preprocessors
 
An Empirical Study on the Use of CSS Preprocessors
An Empirical Study on the Use of CSS PreprocessorsAn Empirical Study on the Use of CSS Preprocessors
An Empirical Study on the Use of CSS Preprocessors
 
Improving the Unification of Software Clones Using Tree and Graph Matching Al...
Improving the Unification of Software Clones Using Tree and Graph Matching Al...Improving the Unification of Software Clones Using Tree and Graph Matching Al...
Improving the Unification of Software Clones Using Tree and Graph Matching Al...
 
Code Smell Research: History and Future Directions
Code Smell Research: History and Future DirectionsCode Smell Research: History and Future Directions
Code Smell Research: History and Future Directions
 
Preventive Software Maintenance: The Past, the Present, the Future
Preventive Software Maintenance: The Past, the Present, the FuturePreventive Software Maintenance: The Past, the Present, the Future
Preventive Software Maintenance: The Past, the Present, the Future
 
An Empirical Study of Duplication in Cascading Style Sheets
An Empirical Study of Duplication in Cascading Style SheetsAn Empirical Study of Duplication in Cascading Style Sheets
An Empirical Study of Duplication in Cascading Style Sheets
 
Ranking Refactoring Suggestions based on Historical Volatility
Ranking Refactoring Suggestions based on Historical VolatilityRanking Refactoring Suggestions based on Historical Volatility
Ranking Refactoring Suggestions based on Historical Volatility
 
Feature Detection in Ajax-enabled Web Applications
Feature Detection in Ajax-enabled Web ApplicationsFeature Detection in Ajax-enabled Web Applications
Feature Detection in Ajax-enabled Web Applications
 
A Multidimensional Empirical Study on Refactoring Activity
A Multidimensional Empirical Study on Refactoring ActivityA Multidimensional Empirical Study on Refactoring Activity
A Multidimensional Empirical Study on Refactoring Activity
 
Unification and Refactoring of Clones
Unification and Refactoring of ClonesUnification and Refactoring of Clones
Unification and Refactoring of Clones
 

Recently uploaded

Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 

Recently uploaded (20)

Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 

Refactoring Mining - The key to unlock software evolution

  • 1. Refactoring Mining The key to unlock software evolution Nikolaos Tsantalis Concordia University
  • 7. • Ideas inspired from UMLDiff • Structure of method bodies is ignored • Method body includes only method calls and field accesses • Most refactorings detected based on signature matching • Only precision is provided • Evaluation included only 3 systems
  • 9.
  • 10. Why did I stop working on this research? Ref-Finder (ICSM 2010): "The precision and recall on open source projects were 0.74 and 0.96 respectively." "Since these programs did not document refactorings, we created a set of correct refactorings by running REF-FINDER with a similarity threshold (σ=0.65) and manually verified them. We then measured a recall by comparing this set with the results found using a higher threshold (σ=0.85)"
  • 11. Lesson #1 Don't be intimidated by super results Always question the results
  • 12. Until one day... Danilo Silva Marco Tulio Valente
  • 13. Where was the code?
  • 14. Lesson #2 Always make your code & data available in a repository You never know what it can enable in the future
  • 15. Open Science initiatives M. T. Baldassarre, N. Ernst, B. Hermann, T. Menzies, R. Yedida Mandatory Data Availability field
  • 16. Let's do an empirical study about refactoring We must study Why We Refactor
  • 18. • Danilo developed the API of RefactoringMiner • Tooling for checking out and parsing Git commits • Infrastructure for monitoring GitHub projects • Automatic generation of emails to contact developers • A web app for thematic analysis
  • 19. Firehouse interview • Monitored 124 GitHub projects between June 8th and August 7th, 2015 • Sent 465 emails and received 195 responses (42%) • +27 commits with a description explaining the reasons • Compiled a catalogue of 44 distinct motivations for 12 well-known refactoring types
  • 21. ICSE'16 rejection Reviewer #1: "A major threat to the research is not discussed or considered, that RefFinder has poor recall (0.24 [31]). The authors did a good job of combating the low-precision by manually inspecting results, the low recall is not discussed or dealt with."
  • 22. Lesson #3 Even ICSE reviewers make mistakes Don't get disappointed. Improve, advance, re-submit
  • 23. FSE'16 re-submission • Renamed the tool from RefDetectorto RefactoringMiner • Addressed most of reviewers' comments
  • 24.
  • 28. Limitations of previous approaches 1. Dependence on similarity thresholds • thresholds need calibration for projects with different characteristics 2. Dependence on built versions • only 38% of the change history can be successfully compiled [Tufano et al., 2017] 3. Unreliable oracles for evaluating precision/recall • Incomplete (refactorings found in release notes or commit messages) • Biased (applying a single tool with two different similarity thresholds) • Artificial (seeded refactorings)
  • 30. private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } After Before
  • 31. private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(int count) { List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } After Before
  • 32. private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } After Before private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; }
  • 33. private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } After Before private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; }
  • 34. private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { } return addresses; } try { addresses[i] = new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } After Before
  • 35. private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } After Before
  • 36. protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } After Before textual similarity  30%
  • 37. private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } After Before (1) Abstraction
  • 38. private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } After Before (1) Abstraction
  • 39. private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } After Before (2) Argumentization
  • 40. private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } After Before (2) Argumentization
  • 41. private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } After Before (3) AST Node Replacements
  • 42. private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } After Before (3) AST Node Replacements
  • 43. private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } After Before textual similarity = 100%
  • 44. Reliable oracle • We used the “Why We Refactor” dataset (538 commits from 185 open source projects) • We executed both available tools (RefactoringMiner and RefDiff) • Converted the output of the tools to the same format • We manually validated all refactoring instances (4,108 unique instances, out of which 3,188 were true positives and 920 were false positives) • The validation process was labor-intensive and involved 3 validators for a period of 3 months (i.e., 9 person-months) • To compute recall, we considered the union of the true positives reported by both tools as the ground truth. Matin Mansouri Laleh Eshkevari Davood Mazinanian
  • 45.
  • 46.
  • 47. ICSE’18 retrospective • Our solution was still far from perfect • Time pressure • It was urgent to establish RefactoringMiner with a publication • ICSE deadline: August 25, 2017 • Sophia’s birth: August 15, 2017 • Despite working over 1 year on this paper, there was still space for improvement • The entire team graduated after this work
  • 49. • Added more replacement types • 20 new sub-method level refactoring types detected based on AST node replacements • Nested refactoring detection • Refactoring inference • Improved the matching of method calls to method declarations with argument type inference
  • 50. Even more reliable oracle • We executed all available tools • RefactoringMiner 1.0 and 2.0 • RefDiff 0.1.1, 1.0, 2.0 • GumTreeDiff 2.1.2 • Converted the output of the tools to the same format • We validated 5,830 new unique refactoring instances, out of which 4,038 were true positives and 1,792 were false positives. • 7,226 true positives in total for 40 different refactoring types (72% of true instances are detected by two or more tools) Ameya Ketkar
  • 51.
  • 52.
  • 54. From loop and if to Stream API .forEach() and .filter()
  • 55. Current status • 85 supported refactoring types • 10,900 validated true positives in the oracle • Precision: 99.7% • Recall: 97% • Most tested and reliable version (200K commits without exception) • Independent studies confirm it has the best precision
  • 56. Lesson #4 What it takes to make a reliable & usable tool Minimum 5 years of research and development Extensive testing Detailed documentation (README with API code snippets) Supporting users (200+ issues resolved) Stable project leader Great team
  • 58. Promise #1 “researchers can replicate existing empirical studies and refute or confirm previously-held beliefs.”
  • 59.
  • 60. Threats to validity • Both studies relied on Ref-Finder • Independent studies revealed that Ref-Finder had low precision/recall • 35% precision, 24% recall [Soares et al. JSS 2013] • 27% precision [Kádár et al. PROMISE 2016] • Ref-Finder paper claimed 74% precision, 96% recall • Collected refactoring info at release level (between two versions) • Coarse-grained analysis led to strong and unsafe assumptions
  • 61.
  • 62. Lesson #5 A tool comes with a huge responsibility for its authors We should go above and beyond when evaluating its accuracy
  • 63. Promise #2 “reduce the noise created by refactorings, such as file/directory renaming, and significantly improve the accuracy of other tools.”
  • 64.
  • 65. Lesson #6 Making existing code evolution analysis techniques refactoring-aware can improve their accuracy
  • 66. Promise #3 “our oracle of true refactorings from 538 commits across 185 projects provides an invaluable resource for validating novel refactoring tools and for comparing existing approaches”
  • 67.
  • 68. Promise #4 “enable online refactoring detection on partial input, when a developer inspects a code diff to review a change, or tries to understand code evolution selectively”
  • 69. • Partially refactoring-aware • Supports method signature changes • Method moves • File renames/moves • Uses thresholds when comparing program elements, calibrated on a training set of 100 methods • Mismatches methods from which a significant part of their body has been extracted to new methods, as it uses a 75% body similarity threshold to match modified methods • Ultra-fast: Less than 2 seconds to fetch the entire change history of a method
  • 70. Challenge How can we take advantage of RefactoringMiner accuracy in a way that is not computationally expensive? Mehran Jodavi Solution Partial and incremental commit analysis based on the location of the tracked program element
  • 71. Precision +7.5% Recall +3.7% Precision +9.1% Recall +5.8% > 97% precision/recall change level Method tracking commit level Method tracking change level Variable tracking > 98% precision/recall commit level
  • 72. Promise #5 “Integrating RefactoringMiner with the diff and code review tools can raise the level of abstraction for code changes originating from refactorings, thus helping developers better understand the code evolution”
  • 73. Diff Hunk Graph Nodes: diff hunks Hard links: syntactic constraints between the AST nodes Soft links: repetitive change patterns Refactoring links: edits in different locations originating from the same refactoring Cosmetic links: changes in comments & reformatting
  • 74. Promise #6 “refactoring operations can be automatically documented at commit-time to provide a more detailed description of the applied changes in the commit message”
  • 75.
  • 76.
  • 78.
  • 79. File f = new File(:[v0], :[v1])  Path f = :[v0].resolve(:[v1]) FileOutputStream stream = new FileOutputStream(:[v0])  OutputStream stream = Files.newOutputStream(:[v0])  Ameya Ketkar Rewrite rules
  • 80.
  • 82. Why We Refactor++ A large-scale empirical study • We developed detection rules for refactoring motivations • We optimized the rules on the FSE’16 “Why We Refactor” dataset • We validated their accuracy on the TOSEM’2020 dataset • Precision: 98.4% Recall: 93.5% • We collected the motivations for 346K Extract Method instances found in 132,897 commits of 325 open-source repositories Sadegh Aalizadeh J. Pantiuchina, F. Zampetti, S. Scalabrino, V. Piantadosi, R. Oliveto, G. Bavota, and M. Di Penta, "Why Developers Refactor Source Code: A Mining-based Study,“ ACM Transactions on Software Engineering and Methodology, Volume 29, Issue 4, Article 29, September 2020.
  • 83.
  • 84. Refactoring recommendation tools for top-5 Extract Method motivations 1. Reusable Method [NONE] 2. Remove Duplication [CeDAR (Tairas and Gray, IST’12), Creios (Hotta et al., CSMR’12), Rase (Meng et al., ICSE’15), JDeodorant (Tsantalis et al., TSE’15 + ICSE’17), CRec (Yue et al., ICSME’18)] 3. Facilitate Extension [FR-Refactor (Ally S. Nyamawe et al. RE’19 + EMSE’20)] 4. Decompose for Readability [JDeodorant (Tsantalis and Chatzigeorgiou, JSS’11), JExtract (Silva et al., ICPC’14), SEMI (Charalampidou et al., TSE’17), GEMS (Xu et al., ISSRE’17)] 5. Alternative Signature [NONE]
  • 86.
  • 87.
  • 88. Making better refactoring mining tools is a community effort Mining competitions hosted in the Workshop Community-created benchmarks Hackathons for improving tools

Editor's Notes

  1. This painting shows the Hercules crossroads. There are two women embodying Virtue and Vice. On the right, Vice appears as a young attractive woman pointing Hercules to an easy down-slope well-paved path leading to a life of pleasure in the short-term. But in the long term, we can see disaster in the background. On the left, Virtue appears and an old exhausted woman pointing Hercules to a hill that is difficult to climb but leads to everlasting honour. We have the same choices in research. We can always take the easy path and enjoy the temporary benefits of publishing a solution first. The best solution takes a lot of hard work but will have eventually more impact.
  2. Both studies used RefactoringMiner to collect refactorings at commit level and had more fine-grained information about the location of the refactoring. In addition, both studies performed manual inspections to validate their findings.
  3. https://github.com/jodavimehran/CodeTracker