Successfully reported this slideshow.

How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics in Stack Overflow

0

Share

1 of 33
1 of 33

How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics in Stack Overflow

0

Share

Download to read offline

Presented at: The 44th IEEE/ACM International Conference on Software Engineering (ICSE 2022)

Date of Conference: May 2022
Conference Location: Virtual & Pittsburgh, PA, USA

This paper was originally published in the Empirical Software Engineering journal

The preprint is available at: https://arxiv.org/pdf/2110.12229

A video of the presentation is available at: https://youtu.be/suWRL2nmxMs

Presented at: The 44th IEEE/ACM International Conference on Software Engineering (ICSE 2022)

Date of Conference: May 2022
Conference Location: Virtual & Pittsburgh, PA, USA

This paper was originally published in the Empirical Software Engineering journal

The preprint is available at: https://arxiv.org/pdf/2110.12229

A video of the presentation is available at: https://youtu.be/suWRL2nmxMs

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics in Stack Overflow

  1. 1. How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics in Stack Overflow Anthony Peruma · Steven Simmons · Eman AlOmar · Christian Newman · Mohamed Mkaouer · Ali Ouni I C S E J o u r n a l - F i r s t P a p e r
  2. 2. Software Refactoring 2 An essential part of software maintenance and evolution Improves the internal quality of the system, and reduce its technical debt Research in refactoring is well-established ➢ Detection of refactoring opportunities & code recommendations
  3. 3. Refactoring research is continually evolving Are developers applying refactorings in the same environments, on problems with the same characteristics and context, as researchers assume? • Refactoring is no longer about correcting code smells • Industry projects are complex and require more complicated solutions • Prior studies interviewed developers
  4. 4. GOAL 4 Understand the trends and challenges around developer discussions on software refactoring concepts and activities
  5. 5. The most popular programming-specific question and answer forum Over 19 million questions and one million users
  6. 6. Research Questions 6 RQ1: How have refactoring discussions on Stack Overflow grown over the years? RQ2: What do developers discuss in refactoring-based Stack Overflow posts? RQ3: Which topics are the most popular and difficult among refactoring-related questions?
  7. 7. 7 Study Methodology
  8. 8. Experiment design 8 Posts – Questions, Answers & Accepted Answers Tags – Associated with a question Score – Higher the score the better View Count – Number of time the post was viewed Posts with the refactor tag Posts having ‘refactor’ in the title Quantitative – database queries and custom code Qualitative – manually analyzing a statistically significant sample
  9. 9. Anatomy of a post 9 tags score title body views QUESTION ANSWER score accepted answer
  10. 10. A mixed-methods approach 10
  11. 11. Summary of collected data 11
  12. 12. 12 Empirical Results
  13. 13. How have refactoring discussions on Stack Overflow grown over the years? 1. How have refactoring posts grown throughout the years? 2. What is the distribution of questions and answers among developers? 3. What are the tags that are associated with refactoring questions? RQ 1 13
  14. 14. RQ 1.1: How have refactoring posts grown throughout the years? Approach: • Extract all questions that had the term ‘refactor’ in either the title or tag • Extract all answers (i.e., accepted and non-accepted) associated with the questions Findings: • 9,489 questions, from which, 828 did not have an associated answer • Median time between a question and its first answer is 0.27 hours • While the number of questions and accepted answers have increased yearly, the volume by which they increased has been falling 14
  15. 15. RQ 1.2: What is the distribution of questions and answers among developers? Approach: • Utilize the OwnerUserId field to identify the creator of a post Findings: • 7,795 distinct users are responsible for creating all refactoring questions • Most developers asking questions, tend to only ask questions and not answer questions • Most developers would ask only one refactoring question 15
  16. 16. RQ 1.3: What are the tags that are associated with refactoring questions? Approach: • Extract all distinct tags from all refactoring posts • Manual review of the tags Findings: • 3,053 distinct tags • Top five tags are related to programming languages (or web frameworks) – Java, C#, JavaScript, Ruby on Rails, and Ruby • Constant rise in JavaScript questions 16
  17. 17. 17 RQ 1 Summary How have refactoring discussions on Stack Overflow grown over the years? • Stack Overflow is a popular venue for refactoring discussions between developers • Refactoring questions usually receive a response in a short period of time • There is a rise in questions around dynamically typed languages such as JavaScript • Most tags are on algorithm and programming concepts, followed by frameworks
  18. 18. What do developers discuss in refactoring-based Stack Overflow posts? 1. What are the frequent terms utilized by developers in refactoring discussions? 2. To what extent do traditional refactoring opportunities, known in existing literature, match with the challenges faced by developers in Stack Overflow posts? 3. What are the topics around software refactoring that are being asked by developers? RQ 2 18
  19. 19. RQ 2.1: What are the frequent terms utilized by developers in refactoring discussions? Approach: • Extract the top keywords as bigrams from question posts • Existence of terms correspond to refactoring operations Findings: • IDE ‘visual studio’ plays an important part in refactoring discussions – the IDE supports multiple languages • ‘refactoring tool’ shows the importance and reliance of tools and IDEs in refactoring activities • ‘legacy code’ highlights a common reason why developers request support with refactoring • Code extraction and moving are frequently discussed 19
  20. 20. RQ 2.2: To what extent do traditional refactoring opportunities, known in existing literature, match with the challenges faced by developers in Stack Overflow posts? Approach: • Occurrence of Self-Affirmed Refactoring terms in questions Findings: • Frequent mention of key internal quality attributes -- dependency, inheritance • Use of terms such as ‘clean up’ or ‘redesign’ to discuss refactorings • Non-functional attribute discussion around ‘readability’, ‘efficiency’, and ‘performance’ 20
  21. 21. RQ 2.3: What are the topics around software refactoring that are being asked by developers? Approach: • Topic modeling analysis using latent Dirichlet allocation • Includes text-preprocessing • Use of topic coherence, perplexity and visualization to determine the optimum number of topics • Manual analysis of a statistically significant sample of questions 21 Findings:
  22. 22. RQ 2.3: What are the topics around software refactoring that are being asked by developers? 22 Code Optimization Simplifying code structures Improve readability and reusability Reduce lengthy switch-case statements, loops, and duplicate code Tools and IDEs Perform complex refactorings Renaming software artifacts Architecture and Design Patterns Accumulation of code updates violate design principles Applying SOLID, DRY, SRP, and KISS principles Unit Testing Challenges with evolving the test suite alongside the source code Database Business logic within SQL scripts grow in length and complexity Challenges with readability, design principles, and system performance
  23. 23. 23 RQ 2 Summary What do developers discuss in refactoring-based Stack Overflow posts? • Refactoring discussions revolve around five topics – Code Optimization, Tools and IDEs, Architecture and Design Patterns, Unit Testing, and Database • Maintainability is a key concern • Improving readability and reusability is of utmost concern • Challenges in synchronizing refactoring changes across software engineering artifacts
  24. 24. Which topics are the most popular and difficult among refactoring- related questions? RQ 3 24
  25. 25. Which topics are the most popular and difficult among refactoring-related questions? Approach: • Measure popularity using a questions view count, favorite count, and score • Measure difficulty: questions without answers, without accepted answers and median time for an accepted answer Findings: • Questions on Tools/IDEs is the most popular, Database is the least popular • Tools/IDE questions get more views than code optimization questions • Questions on Tools/IDE are mostly unanswered than others • Code Optimization questions are less challenging to answer 25
  26. 26. 26 Discussion & Takeaways
  27. 27. Supporting the community 27 Research/Academic Community Developer Community Tool/IDE Vendor Community
  28. 28. Research/Academic community • Course curriculum to reflect real-world settings • Adaptation of refactoring operations for multiple programming language and artifact types • Improve and extend the applicability of readability quality metrics • Expand the study and applicability of reusability beyond source code 28
  29. 29. Tool/IDE vendor community • Automatic synchronization between project artifacts • Enhanced rename refactoring functionality • Enhance the user experience 29
  30. 30. Developer community • Extend coding standards utilized in projects to support naming standards for all project artifacts • Integrating code quality tools into the build process for the early detection of poor coding practices • Perform frequent and early peer-reviews on all project artifacts 30
  31. 31. 31 Conclusion
  32. 32. Conclusion A quantitative and qualitative analysis of refactoring questions asked by developers on Stack Overflow Findings: • Stack Overflow is a popular venue for developers to seek assistance with refactoring • Growth in refactoring dynamically typed code such as Python and JavaScript • Most questions are around optimizing source code to improve readability and reusability • Refactoring is not limited to source code – database and unit testing artifact refactoring is common • Tools are also a popular discussion topic among developers 32 Preprint: https://arxiv.org/abs/2110.12229
  33. 33. Thank You! 33 Anthony Peruma h t t p : / / p e r u m a . m e h t t p : / / s c a n l . o r g

×