Refactoring Mining - The key to unlock software evolution

Refactoring
Mining
The key to unlock
software evolution
Nikolaos Tsantalis
Concordia University

Alexander
Chatzigeorgiou
Eleni
Stroulia
Zhenchang
Xing
Fabio
Rocha

• Ideas inspired from UMLDiff
• Structure of method bodies is ignored
• Method body includes only method calls and
field accesses
• Most refactorings detected based on
signature matching
• Only precision is provided
• Evaluation included only 3 systems

Why did I stop
working on
this research?
Ref-Finder (ICSM 2010): "The precision
and recall on open source projects
were 0.74 and 0.96 respectively."
"Since these programs did not document
refactorings, we created a set of correct
refactorings by running REF-FINDER with a
similarity threshold (σ=0.65) and manually
verified them. We then measured a recall
by comparing this set with the results
found using a higher threshold (σ=0.85)"

Lesson #1
Don't be intimidated by super
results
Always question the results

Until one day... Danilo Silva Marco Tulio
Valente

Lesson #2
Always make your code & data
available in a repository
You never know what it can
enable in the future

Open Science initiatives
M. T. Baldassarre, N. Ernst, B. Hermann, T. Menzies, R. Yedida
Mandatory
Data Availability
field

Let's do an
empirical
study about
refactoring
We must study
Why We Refactor

• Danilo developed the API of RefactoringMiner
• Tooling for checking out and parsing Git commits
• Infrastructure for monitoring GitHub projects
• Automatic generation of emails to contact developers
• A web app for thematic analysis

Firehouse interview
• Monitored 124 GitHub projects between June 8th and August 7th, 2015
• Sent 465 emails and received 195 responses (42%)
• +27 commits with a description explaining the reasons
• Compiled a catalogue of 44 distinct motivations for 12
well-known refactoring types

Motivation
Catalogue
Artifact
https://github.com/aserg-
ufmg/why-we-refactor

ICSE'16 rejection
Reviewer #1: "A major threat to the research is not discussed or
considered, that RefFinder has poor recall (0.24 [31]). The authors did a
good job of combating the low-precision by manually inspecting
results, the low recall is not discussed or dealt with."

Lesson #3
Even ICSE reviewers make
mistakes
Don't get disappointed.
Improve, advance, re-submit

FSE'16 re-submission
• Renamed the tool from RefDetectorto RefactoringMiner
• Addressed most of reviewers' comments

RefDiff MSR'17
RefactoringMiner ...

Limitations of previous approaches
1. Dependence on similarity thresholds
• thresholds need calibration for projects with different characteristics
2. Dependence on built versions
• only 38% of the change history can be successfully compiled [Tufano et al., 2017]
3. Unreliable oracles for evaluating precision/recall
• Incomplete (refactorings found in release notes or commit messages)
• Biased (applying a single tool with two different similarity thresholds)
• Artificial (seeded refactorings)

private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
try {
addresses[i] =
}
}
}
return addresses;
}
After
Before

try {
addresses[i] =
}
}
}
return addresses;
}
private static List<Address> createAddresses(int count) {
List<Address> addresses = new ArrayList<Address>(count);
try {
addresses[i] =
}
}
}
return addresses;
}
After
Before

try {
addresses[i] =
}
}
}
return addresses;
}
After
Before
private static List<Address> createAddresses(AtomicInteger ports, int count){
try {
addresses[i] =
new Address("127.0.0.1", ports.incrementAndGet());
}
}
}
return addresses;
}

try {
addresses[i] =
}
}
}
return addresses;
}
}
return addresses;
}
try {
addresses[i] =
new Address("127.0.0.1", ports.incrementAndGet());
}
}
After
Before

try {
addresses[i] =
}
}
}
return addresses;
}
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address(host, port);
}
}
return null;
}
After
Before

try {
}
}
return null;
}
try {
addresses[i] =
}
}
}
return addresses;
}
}
return addresses;
}
After
Before
textual similarity  30%

try {
addresses[i] =
}
}
}
return addresses;
}
}
return addresses;
}
try {
}
}
return null;
}
After
Before
(1) Abstraction

try {
addresses[i] =
}
}
}
return addresses;
}
}
return addresses;
}
try {
}
}
return null;
}
After
Before
(2) Argumentization

try {
addresses[i] =
}
}
}
return addresses;
}
}
return addresses;
}
try {
return new Address("127.0.0.1", ports.incrementAndGet());
}
}
return null;
}
After
Before
(2) Argumentization

try {
addresses[i] =
}
}
}
return addresses;
}
}
return addresses;
}
try {
return new Address("127.0.0.1", ports.incrementAndGet());
}
}
return null;
}
After
Before
(3) AST Node Replacements

try {
addresses[i] =
}
}
}
return addresses;
}
}
return addresses;
}
try {
return new Address("127.0.0.1", PORTS.incrementAndGet());
}
}
return null;
}
After
Before
(3) AST Node Replacements

try {
addresses[i] =
}
}
}
return addresses;
}
}
return addresses;
}
try {
return new Address("127.0.0.1", PORTS.incrementAndGet());
}
}
return null;
}
After
Before
textual similarity = 100%

Reliable oracle
• We used the “Why We Refactor” dataset
(538 commits from 185 open source projects)
• We executed both available tools (RefactoringMiner and RefDiff)
• Converted the output of the tools to the same format
• We manually validated all refactoring instances (4,108 unique instances,
out of which 3,188 were true positives and 920 were false positives)
• The validation process was labor-intensive and involved 3 validators for
a period of 3 months (i.e., 9 person-months)
• To compute recall, we considered the union of the true positives
reported by both tools as the ground truth.
Matin Mansouri Laleh Eshkevari Davood Mazinanian

ICSE’18 retrospective
• Our solution was still far from perfect
• Time pressure
• It was urgent to establish RefactoringMiner with a publication
• ICSE deadline: August 25, 2017
• Sophia’s birth: August 15, 2017
• Despite working over 1 year on this paper, there was still space for
improvement
• The entire team graduated after this work

• Added more replacement types
• 20 new sub-method level refactoring types
detected based on AST node replacements
• Nested refactoring detection
• Refactoring inference
• Improved the matching of method calls to
method declarations with argument type
inference

Even more reliable oracle
• We executed all available tools
• RefactoringMiner 1.0 and 2.0
• RefDiff 0.1.1, 1.0, 2.0
• GumTreeDiff 2.1.2
• Converted the output of the tools to the same format
• We validated 5,830 new unique refactoring instances, out of which
4,038 were true positives and 1,792 were false positives.
• 7,226 true positives in total for 40 different refactoring types
(72% of true instances are detected by two or more tools)
Ameya Ketkar

From loop and if to
Stream API .forEach() and .filter()

Current status
• 85 supported refactoring types
• 10,900 validated true positives in the oracle
• Precision: 99.7%
• Recall: 97%
• Most tested and reliable version (200K commits without exception)
• Independent studies confirm it has the best precision

Lesson #4
What it takes to make a reliable & usable tool
Minimum 5 years of research and development
Extensive testing
Detailed documentation (README with API code snippets)
Supporting users (200+ issues resolved)
Stable project leader
Great team

Promise #1
“researchers can replicate existing empirical studies
and refute or confirm previously-held beliefs.”

Threats to validity
• Both studies relied on Ref-Finder
• Independent studies revealed that Ref-Finder had low precision/recall
• 35% precision, 24% recall [Soares et al. JSS 2013]
• 27% precision [Kádár et al. PROMISE 2016]
• Ref-Finder paper claimed 74% precision, 96% recall
• Collected refactoring info at release level (between two versions)
• Coarse-grained analysis led to strong and unsafe assumptions

Lesson #5
A tool comes with a huge
responsibility for its authors
We should go above and beyond
when evaluating its accuracy

Promise #2
“reduce the noise created by refactorings, such as
file/directory renaming, and significantly improve the
accuracy of other tools.”

Lesson #6
Making existing code evolution analysis techniques
refactoring-aware can improve their accuracy

Promise #3
“our oracle of true refactorings from 538 commits across 185
projects provides an invaluable resource for validating novel
refactoring tools and for comparing existing approaches”

Promise #4
“enable online refactoring detection on partial input, when a
developer inspects a code diff to review a change, or tries
to understand code evolution selectively”

• Partially refactoring-aware
• Supports method signature changes
• Method moves
• File renames/moves
• Uses thresholds when comparing
program elements, calibrated on a
training set of 100 methods
• Mismatches methods from which a
significant part of their body has been
extracted to new methods, as it uses a
75% body similarity threshold to match
modified methods
• Ultra-fast: Less than 2 seconds to fetch
the entire change history of a method

Challenge
How can we take advantage of RefactoringMiner accuracy
in a way that is not computationally expensive?
Mehran Jodavi
Solution
Partial and incremental commit analysis based on
the location of the tracked program element

Precision +7.5%
Recall +3.7%
Precision +9.1%
Recall +5.8%
> 97% precision/recall change level
Method tracking commit level
Method tracking change level
Variable tracking
> 98% precision/recall commit level

Promise #5
“Integrating RefactoringMiner with the diff and code review
tools can raise the level of abstraction for code changes
originating from refactorings, thus helping developers better
understand the code evolution”

Diff Hunk Graph
Nodes: diff hunks
Hard links: syntactic constraints between the AST nodes
Soft links: repetitive change patterns
Refactoring links: edits in different locations originating
from the same refactoring
Cosmetic links: changes in comments & reformatting

Promise #6
“refactoring operations can be automatically documented at
commit-time to provide a more detailed description of the
applied changes in the commit message”

File f = new File(:[v0], :[v1]) 
Path f = :[v0].resolve(:[v1])
FileOutputStream stream = new FileOutputStream(:[v0]) 
OutputStream stream = Files.newOutputStream(:[v0])

Ameya Ketkar
Rewrite rules

Why We Refactor++
A large-scale empirical study
• We developed detection rules for refactoring motivations
• We optimized the rules on the FSE’16 “Why We Refactor” dataset
• We validated their accuracy on the TOSEM’2020 dataset
• Precision: 98.4% Recall: 93.5%
• We collected the motivations for 346K Extract Method instances
found in 132,897 commits of 325 open-source repositories
Sadegh Aalizadeh
J. Pantiuchina, F. Zampetti, S. Scalabrino, V. Piantadosi, R. Oliveto, G. Bavota, and M. Di Penta,
"Why Developers Refactor Source Code: A Mining-based Study,“
ACM Transactions on Software Engineering and Methodology, Volume 29, Issue 4, Article 29, September 2020.

Refactoring recommendation tools for
top-5 Extract Method motivations
1. Reusable Method [NONE]
2. Remove Duplication [CeDAR (Tairas and Gray, IST’12), Creios (Hotta et al.,
CSMR’12), Rase (Meng et al., ICSE’15), JDeodorant (Tsantalis et al., TSE’15 +
ICSE’17), CRec (Yue et al., ICSME’18)]
3. Facilitate Extension [FR-Refactor (Ally S. Nyamawe et al. RE’19 + EMSE’20)]
4. Decompose for Readability [JDeodorant (Tsantalis and Chatzigeorgiou,
JSS’11), JExtract (Silva et al., ICPC’14), SEMI (Charalampidou et al., TSE’17),
GEMS (Xu et al., ISSRE’17)]
5. Alternative Signature [NONE]

Making better refactoring mining tools is
a community effort
Mining competitions hosted in the Workshop
Community-created benchmarks
Hackathons for improving tools

https://github.com/tsantalis/RefactoringMiner
https://github.com/JetBrains-Research/RefactorInsight
https://github.com/JetBrains-Research/kotlinRMiner
https://github.com/JetBrains-Research/data-driven-type-migration
Oleg Smirnov Ameya Ketkar
Zarina Kurbatova
https://github.com/jodavimehran/code-tracker
Mehran Jodavi

Refactoring Mining - The key to unlock software evolution

Recommended

Recommended

More Related Content

Similar to Refactoring Mining - The key to unlock software evolution

Similar to Refactoring Mining - The key to unlock software evolution (20)

More from Nikolaos Tsantalis

More from Nikolaos Tsantalis (16)

Recently uploaded

Recently uploaded (20)

Refactoring Mining - The key to unlock software evolution

Editor's Notes