How Do I Refactor This?
An Empirical Study on
Refactoring Trends and
Topics in Stack Overflow
Anthony Peruma · Steven Simmons ·
Eman AlOmar · Christian Newman ·
Mohamed Mkaouer · Ali Ouni
I C S E J o u r n a l - F i r s t P a p e r
Software Refactoring
2
An essential part of software maintenance and evolution
Improves the internal quality of the system, and reduce its
technical debt
Research in refactoring is well-established
➢ Detection of refactoring opportunities & code recommendations
Refactoring research is
continually evolving
Are developers applying refactorings in the
same environments, on problems with the
same characteristics and context, as
researchers assume?
• Refactoring is no longer about correcting
code smells
• Industry projects are complex and require
more complicated solutions
• Prior studies interviewed developers
GOAL
4
Understand the trends and challenges
around developer discussions on software
refactoring concepts and activities
The most popular programming-specific question and answer forum
Over 19 million questions and one million users
Research Questions
6
RQ1: How have refactoring discussions on Stack Overflow
grown over the years?
RQ2: What do developers discuss in refactoring-based
Stack Overflow posts?
RQ3: Which topics are the most popular and difficult
among refactoring-related questions?
7
Study Methodology
Experiment design
8
Posts – Questions, Answers &
Accepted Answers
Tags – Associated with a question
Score – Higher the score the
better
View Count – Number of time
the post was viewed
Posts with the refactor tag
Posts having ‘refactor’ in the
title
Quantitative – database
queries and custom code
Qualitative – manually
analyzing a statistically
significant sample
Anatomy of a post
9
tags
score
title
body
views
QUESTION
ANSWER
score
accepted answer
A mixed-methods approach
10
Summary of collected data
11
12
Empirical Results
How have refactoring discussions on
Stack Overflow grown over the years?
1. How have refactoring posts grown throughout
the years?
2. What is the distribution of questions and answers
among developers?
3. What are the tags that are associated with
refactoring questions?
RQ 1
13
RQ 1.1: How have refactoring posts grown throughout the
years?
Approach:
• Extract all questions that had the term ‘refactor’ in either the title or tag
• Extract all answers (i.e., accepted and non-accepted) associated with the
questions
Findings:
• 9,489 questions, from which, 828 did not have an associated answer
• Median time between a question and its first answer is 0.27 hours
• While the number of questions and accepted answers have increased yearly,
the volume by which they increased has been falling
14
RQ 1.2: What is the distribution of questions and answers
among developers?
Approach:
• Utilize the OwnerUserId field to identify the creator of a post
Findings:
• 7,795 distinct users are responsible for creating all refactoring questions
• Most developers asking questions, tend to only ask questions and not answer
questions
• Most developers would ask only one refactoring question
15
RQ 1.3: What are the tags that are associated with
refactoring questions?
Approach:
• Extract all distinct tags from all refactoring posts
• Manual review of the tags
Findings:
• 3,053 distinct tags
• Top five tags are related to programming
languages (or web frameworks) – Java, C#,
JavaScript, Ruby on Rails, and Ruby
• Constant rise in JavaScript questions
16
17
RQ 1 Summary
How have refactoring discussions on Stack Overflow
grown over the years?
• Stack Overflow is a popular venue for refactoring discussions between developers
• Refactoring questions usually receive a response in a short period of time
• There is a rise in questions around dynamically typed languages such as JavaScript
• Most tags are on algorithm and programming concepts, followed by frameworks
What do developers discuss in
refactoring-based Stack Overflow posts?
1. What are the frequent terms utilized by developers
in refactoring discussions?
2. To what extent do traditional refactoring
opportunities, known in existing literature, match
with the challenges faced by developers in Stack
Overflow posts?
3. What are the topics around software refactoring
that are being asked by developers?
RQ 2
18
RQ 2.1: What are the frequent terms utilized by developers
in refactoring discussions?
Approach:
• Extract the top keywords as bigrams from question posts
• Existence of terms correspond to refactoring operations
Findings:
• IDE ‘visual studio’ plays an important part in refactoring
discussions – the IDE supports multiple languages
• ‘refactoring tool’ shows the importance and reliance of tools
and IDEs in refactoring activities
• ‘legacy code’ highlights a common reason why developers
request support with refactoring
• Code extraction and moving are frequently discussed
19
RQ 2.2: To what extent do traditional refactoring opportunities, known in
existing literature, match with the challenges faced by developers in Stack
Overflow posts?
Approach:
• Occurrence of Self-Affirmed Refactoring terms in questions
Findings:
• Frequent mention of key internal quality attributes -- dependency, inheritance
• Use of terms such as ‘clean up’ or ‘redesign’ to discuss refactorings
• Non-functional attribute discussion around ‘readability’, ‘efficiency’, and
‘performance’
20
RQ 2.3: What are the topics around software refactoring that
are being asked by developers?
Approach:
• Topic modeling analysis using
latent Dirichlet allocation
• Includes text-preprocessing
• Use of topic coherence, perplexity
and visualization to determine the
optimum number of topics
• Manual analysis of a statistically
significant sample of questions
21
Findings:
RQ 2.3: What are the topics around software refactoring that
are being asked by developers?
22
Code
Optimization
Simplifying code
structures
Improve readability
and reusability
Reduce lengthy
switch-case
statements, loops,
and duplicate code
Tools and
IDEs
Perform complex
refactorings
Renaming software
artifacts
Architecture
and Design
Patterns
Accumulation of
code updates violate
design principles
Applying SOLID, DRY,
SRP, and KISS
principles
Unit Testing
Challenges with
evolving the test
suite alongside the
source code
Database
Business logic within
SQL scripts grow in
length and
complexity
Challenges with
readability, design
principles, and
system performance
23
RQ 2 Summary
What do developers discuss in refactoring-based
Stack Overflow posts?
• Refactoring discussions revolve around five topics – Code Optimization, Tools and
IDEs, Architecture and Design Patterns, Unit Testing, and Database
• Maintainability is a key concern
• Improving readability and reusability is of utmost concern
• Challenges in synchronizing refactoring changes across software engineering artifacts
Which topics are the most popular
and difficult among refactoring-
related questions?
RQ 3
24
Which topics are the most popular and difficult among
refactoring-related questions?
Approach:
• Measure popularity using a questions view count, favorite count, and score
• Measure difficulty: questions without answers, without accepted answers and
median time for an accepted answer
Findings:
• Questions on Tools/IDEs is the most popular, Database is the least popular
• Tools/IDE questions get more views than code optimization questions
• Questions on Tools/IDE are mostly unanswered than others
• Code Optimization questions are less challenging to answer 25
26
Discussion & Takeaways
Supporting the community
27
Research/Academic
Community
Developer
Community
Tool/IDE Vendor
Community
Research/Academic community
• Course curriculum to reflect real-world settings
• Adaptation of refactoring operations for multiple
programming language and artifact types
• Improve and extend the applicability of
readability quality metrics
• Expand the study and applicability of reusability
beyond source code
28
Tool/IDE vendor community
• Automatic synchronization between project
artifacts
• Enhanced rename refactoring functionality
• Enhance the user experience
29
Developer community
• Extend coding standards utilized in projects to
support naming standards for all project artifacts
• Integrating code quality tools into the build
process for the early detection of poor coding
practices
• Perform frequent and early peer-reviews on all
project artifacts
30
31
Conclusion
Conclusion
A quantitative and qualitative analysis of refactoring questions asked by
developers on Stack Overflow
Findings:
• Stack Overflow is a popular venue for developers to seek assistance with refactoring
• Growth in refactoring dynamically typed code such as Python and JavaScript
• Most questions are around optimizing source code to improve readability and reusability
• Refactoring is not limited to source code – database and unit testing artifact refactoring is common
• Tools are also a popular discussion topic among developers
32
Preprint: https://arxiv.org/abs/2110.12229
Thank You!
33
Anthony Peruma
h t t p : / / p e r u m a . m e
h t t p : / / s c a n l . o r g

How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics in Stack Overflow

  • 1.
    How Do IRefactor This? An Empirical Study on Refactoring Trends and Topics in Stack Overflow Anthony Peruma · Steven Simmons · Eman AlOmar · Christian Newman · Mohamed Mkaouer · Ali Ouni I C S E J o u r n a l - F i r s t P a p e r
  • 2.
    Software Refactoring 2 An essentialpart of software maintenance and evolution Improves the internal quality of the system, and reduce its technical debt Research in refactoring is well-established ➢ Detection of refactoring opportunities & code recommendations
  • 3.
    Refactoring research is continuallyevolving Are developers applying refactorings in the same environments, on problems with the same characteristics and context, as researchers assume? • Refactoring is no longer about correcting code smells • Industry projects are complex and require more complicated solutions • Prior studies interviewed developers
  • 4.
    GOAL 4 Understand the trendsand challenges around developer discussions on software refactoring concepts and activities
  • 5.
    The most popularprogramming-specific question and answer forum Over 19 million questions and one million users
  • 6.
    Research Questions 6 RQ1: Howhave refactoring discussions on Stack Overflow grown over the years? RQ2: What do developers discuss in refactoring-based Stack Overflow posts? RQ3: Which topics are the most popular and difficult among refactoring-related questions?
  • 7.
  • 8.
    Experiment design 8 Posts –Questions, Answers & Accepted Answers Tags – Associated with a question Score – Higher the score the better View Count – Number of time the post was viewed Posts with the refactor tag Posts having ‘refactor’ in the title Quantitative – database queries and custom code Qualitative – manually analyzing a statistically significant sample
  • 9.
    Anatomy of apost 9 tags score title body views QUESTION ANSWER score accepted answer
  • 10.
  • 11.
  • 12.
  • 13.
    How have refactoringdiscussions on Stack Overflow grown over the years? 1. How have refactoring posts grown throughout the years? 2. What is the distribution of questions and answers among developers? 3. What are the tags that are associated with refactoring questions? RQ 1 13
  • 14.
    RQ 1.1: Howhave refactoring posts grown throughout the years? Approach: • Extract all questions that had the term ‘refactor’ in either the title or tag • Extract all answers (i.e., accepted and non-accepted) associated with the questions Findings: • 9,489 questions, from which, 828 did not have an associated answer • Median time between a question and its first answer is 0.27 hours • While the number of questions and accepted answers have increased yearly, the volume by which they increased has been falling 14
  • 15.
    RQ 1.2: Whatis the distribution of questions and answers among developers? Approach: • Utilize the OwnerUserId field to identify the creator of a post Findings: • 7,795 distinct users are responsible for creating all refactoring questions • Most developers asking questions, tend to only ask questions and not answer questions • Most developers would ask only one refactoring question 15
  • 16.
    RQ 1.3: Whatare the tags that are associated with refactoring questions? Approach: • Extract all distinct tags from all refactoring posts • Manual review of the tags Findings: • 3,053 distinct tags • Top five tags are related to programming languages (or web frameworks) – Java, C#, JavaScript, Ruby on Rails, and Ruby • Constant rise in JavaScript questions 16
  • 17.
    17 RQ 1 Summary Howhave refactoring discussions on Stack Overflow grown over the years? • Stack Overflow is a popular venue for refactoring discussions between developers • Refactoring questions usually receive a response in a short period of time • There is a rise in questions around dynamically typed languages such as JavaScript • Most tags are on algorithm and programming concepts, followed by frameworks
  • 18.
    What do developersdiscuss in refactoring-based Stack Overflow posts? 1. What are the frequent terms utilized by developers in refactoring discussions? 2. To what extent do traditional refactoring opportunities, known in existing literature, match with the challenges faced by developers in Stack Overflow posts? 3. What are the topics around software refactoring that are being asked by developers? RQ 2 18
  • 19.
    RQ 2.1: Whatare the frequent terms utilized by developers in refactoring discussions? Approach: • Extract the top keywords as bigrams from question posts • Existence of terms correspond to refactoring operations Findings: • IDE ‘visual studio’ plays an important part in refactoring discussions – the IDE supports multiple languages • ‘refactoring tool’ shows the importance and reliance of tools and IDEs in refactoring activities • ‘legacy code’ highlights a common reason why developers request support with refactoring • Code extraction and moving are frequently discussed 19
  • 20.
    RQ 2.2: Towhat extent do traditional refactoring opportunities, known in existing literature, match with the challenges faced by developers in Stack Overflow posts? Approach: • Occurrence of Self-Affirmed Refactoring terms in questions Findings: • Frequent mention of key internal quality attributes -- dependency, inheritance • Use of terms such as ‘clean up’ or ‘redesign’ to discuss refactorings • Non-functional attribute discussion around ‘readability’, ‘efficiency’, and ‘performance’ 20
  • 21.
    RQ 2.3: Whatare the topics around software refactoring that are being asked by developers? Approach: • Topic modeling analysis using latent Dirichlet allocation • Includes text-preprocessing • Use of topic coherence, perplexity and visualization to determine the optimum number of topics • Manual analysis of a statistically significant sample of questions 21 Findings:
  • 22.
    RQ 2.3: Whatare the topics around software refactoring that are being asked by developers? 22 Code Optimization Simplifying code structures Improve readability and reusability Reduce lengthy switch-case statements, loops, and duplicate code Tools and IDEs Perform complex refactorings Renaming software artifacts Architecture and Design Patterns Accumulation of code updates violate design principles Applying SOLID, DRY, SRP, and KISS principles Unit Testing Challenges with evolving the test suite alongside the source code Database Business logic within SQL scripts grow in length and complexity Challenges with readability, design principles, and system performance
  • 23.
    23 RQ 2 Summary Whatdo developers discuss in refactoring-based Stack Overflow posts? • Refactoring discussions revolve around five topics – Code Optimization, Tools and IDEs, Architecture and Design Patterns, Unit Testing, and Database • Maintainability is a key concern • Improving readability and reusability is of utmost concern • Challenges in synchronizing refactoring changes across software engineering artifacts
  • 24.
    Which topics arethe most popular and difficult among refactoring- related questions? RQ 3 24
  • 25.
    Which topics arethe most popular and difficult among refactoring-related questions? Approach: • Measure popularity using a questions view count, favorite count, and score • Measure difficulty: questions without answers, without accepted answers and median time for an accepted answer Findings: • Questions on Tools/IDEs is the most popular, Database is the least popular • Tools/IDE questions get more views than code optimization questions • Questions on Tools/IDE are mostly unanswered than others • Code Optimization questions are less challenging to answer 25
  • 26.
  • 27.
  • 28.
    Research/Academic community • Coursecurriculum to reflect real-world settings • Adaptation of refactoring operations for multiple programming language and artifact types • Improve and extend the applicability of readability quality metrics • Expand the study and applicability of reusability beyond source code 28
  • 29.
    Tool/IDE vendor community •Automatic synchronization between project artifacts • Enhanced rename refactoring functionality • Enhance the user experience 29
  • 30.
    Developer community • Extendcoding standards utilized in projects to support naming standards for all project artifacts • Integrating code quality tools into the build process for the early detection of poor coding practices • Perform frequent and early peer-reviews on all project artifacts 30
  • 31.
  • 32.
    Conclusion A quantitative andqualitative analysis of refactoring questions asked by developers on Stack Overflow Findings: • Stack Overflow is a popular venue for developers to seek assistance with refactoring • Growth in refactoring dynamically typed code such as Python and JavaScript • Most questions are around optimizing source code to improve readability and reusability • Refactoring is not limited to source code – database and unit testing artifact refactoring is common • Tools are also a popular discussion topic among developers 32 Preprint: https://arxiv.org/abs/2110.12229
  • 33.
    Thank You! 33 Anthony Peruma ht t p : / / p e r u m a . m e h t t p : / / s c a n l . o r g