MINE YOUR OWN CODE
vs
How do I know
what to
refactor?
BACKGROUND
•  Married for 25+ years
•  Working as software developer/architect for 25 years
•  Weighs 20+ kg more
•  Will work 15(+/-?) more years trying to answer that question
EXTENDA
HOW SOFTWARE EVOLVE OVER TIME
Time
Very often becomes
POTENTIAL INDICATORS
  Code Decay
  Lack of design patterns
  Architecture violations
Code Smell
  God Class
  Data Class
  Code Clones
  Tradition Breaker
  Intensive Coupling
  …
SHOW ME THE NUMBERS
I want to refactor
the code! Why? What’s the
business value?
Show me the
numbers!
?
FIND THE HOTSPOTS
  Problems
  Degraded velocity
  Long and breaking builds
  Increasing size and complexity
  SLA issues
  Diagnose/Action
  Test
  Code Review
  Refactor
STATIC CODE ANALYSIS
WHERE TO START?
LARGE/GOD CLASS
ATFD > 5
Access to foreign data
WMC > 46
Weighted method count
TCC > 0.33
Tight class cohesion
God Class
RESEARCH BASED APPROACH
MINING SOFTWARE REPOSITORIES
  Uses other sources as well
  Version Control Systems (Git, Mercurial, Perforce, …)
  Incident Systems (Jira, Bugtracker, ALM, …)
  Communication Platforms (Stack Overflow, Intranets, …)
  Build Servers (Jenkins, TeamCity, GO, …)
  Review Tools (Swarm,
  Organization Schemas
  …
  Adds time aspect
Time
MINING SOFTWARE REPOSITORIES
  Mining software repositories gives us technical and social/
organizational information that we can’t derive from a snapshot.
GOD CLASS
ATFD > 5
Access to foreign data
WMC > 46
Weighted method count
TCC > 0.33
Tight class cohesion
God Class
God classes are 4-17 times more defect prone
God classes are 5-7 times more change prone
DESIGN/TECHNICAL DEBT
  Finding the sweetspot
  When do the cost of maintenance exceed the cost to refactor
– Value of debt (how much is it going to cost to fix it?)
– Interest rate (how much does it slow down development?)
– Probability (what is the chance that the debt affects productivity?)
CODE CHURN
CODE CHURN
  Research has shown that frequent changes to complex code
generally indicate declining quality
  The number of times code changes is a better predictor of defects
than pure size
  Modules that change frequently are linked to maintenance problems
and low quality (An Empirical Study on the Impact of Duplicate Code)
  Including a measure of change in the prior release is an essential
component of our fault prediction method. Individually, counts of adds
and modifications outperform counts of deletes, while the sum of all
three counts was most effective (Does Measuring Code Change
Improve Fault Prediction?)
CODE MAAT
Command line tool to analyse VCS (Git, Mercurial, Subversion,
Team Foundation Server, Perforce)
  Input : VCS log file for the last X days/months/year(s)
  Output :
File Statistics (Number of files, age, …)
Organizational Metrcis (number of authors, code ownership, …)
Coupling
Code Churn
CODE MAAT
Code Maat
DEMO – CODE CHURN
Code Maat
https://github.com/adamtornhill/code-maat
Docker image https://github.com/peternorrhall/code-maat
DEMO EMPEAR
Code Maat +
Hotspots
  Settings/Filtering and Visualization
Performance
Only support for Git
EXTENDA - MSR
Time
Changelist/Files
Job
Defect
Requirement
Refactoring
Code Analysis
Categorisation
EXTENDA MSR
Pentaho
Data Integration
FINDINGS
  New module Self Checkout Client (device integration)
  A lot of development 2014
  A lot of defects and refactorings in 2015 forthe files with highest
code churn and complexity. In accordance with the result in
Empear
  XML complexity as well
STREAMS
  Task streams for larger work
Purpose stable main
main
dev
@
@
task
TEMPORAL COUPLING
Static code dependencies (Structure 101 on the Spring project)
TEMPORAL COUPLING
TestClassA
ClassA
ClassB
Research
•  Change coupling points to architectural weakness
•  Hotspots of refactoring candidates
•  Helps comprehension of system modularization
•  Spotting of misplaced components
•  Correlates with defects (in some cases)
Module A
Module B
DEMO - TEMPORAL COUPLING
EMPEAR
GRAPHVIZ
TEMPORAL COUPLING – USE CASES
Find patterns (.properties should be changed together)
Find hidden dependencies (modules)
  Lack of unit tests or too high velocity of unit tests
Interesting to see how it changes over time
ORGANIZATION AND OWNERSHIP
Time
Ownership where person is about to leave or has left + Age of code
WHAT IS YOUR BUSINESS CASE?
  Do you need to care about it in the first place?
  How long will your product/system live?
  Extenda 20+ years for some of our products
  Data Scientist spend most of their time cleaning data
Remove ”Build user”, Streams, …
What type of commit – defect/refactor/new feature (explicit labeling
works well)
Finding the False Positives
Use the metrics you have
USE AND VISUALIZE YOUR DATA
Free material from www.gapminder.org
THE GOAL
I want to refactor
the code!
Why? Show me
the numbers!
No problem
Boss!
QUESTIONS
THANK YOU FOR LISTENING!
Please ask or give feedback
  Email : peter.norrhall@extenda.com
LinkedIn : https://www.linkedin.com/in/peternorrhall
Twitter : https://twitter.com/peternorrhall
REFERENCES
  "Making Software, What Really Works, and Why We Believe
It", Oram/Wilson
  "Object-Oriented Metrics in Practice", Lanza/Marinescu
  "Your Code as a Crime Scene", Tornhill
  "Investigating the Impact of Design Debt on Software Quality",
Zazworka/Seaman/Shull/Shaw
  MSR International Conference - http://2016.msrconf.org/
TOOLS
Code Maat - https://github.com/adamtornhill/code-maat
Code Maat Docker Image - https://github.com/peternorrhall/code-maat
Docker - https://www.docker.com/
Empear – http://www.empear.com
Graphviz – http://www.graphviz.org
  Git - https://git-scm.com/
  Git-P4 - https://git-scm.com/docs/git-p4
  MS Excel - https://products.office.com/sv-se/excel
Pentaho - http://community.pentaho.com/
Perforce - https://www.perforce.com/
  R Studio - https://www.rstudio.com/
SonarQube - http://www.sonarqube.org/
  Structure101 - http://structure101.com/

Mine Your Own Code

  • 1.
  • 2.
  • 3.
    How do Iknow what to refactor?
  • 4.
    BACKGROUND •  Married for25+ years •  Working as software developer/architect for 25 years •  Weighs 20+ kg more •  Will work 15(+/-?) more years trying to answer that question
  • 5.
  • 6.
    HOW SOFTWARE EVOLVEOVER TIME Time Very often becomes
  • 7.
    POTENTIAL INDICATORS   CodeDecay   Lack of design patterns   Architecture violations Code Smell   God Class   Data Class   Code Clones   Tradition Breaker   Intensive Coupling   …
  • 8.
    SHOW ME THENUMBERS I want to refactor the code! Why? What’s the business value? Show me the numbers! ?
  • 9.
    FIND THE HOTSPOTS  Problems   Degraded velocity   Long and breaking builds   Increasing size and complexity   SLA issues   Diagnose/Action   Test   Code Review   Refactor
  • 10.
  • 11.
  • 12.
    LARGE/GOD CLASS ATFD >5 Access to foreign data WMC > 46 Weighted method count TCC > 0.33 Tight class cohesion God Class
  • 13.
  • 14.
    MINING SOFTWARE REPOSITORIES  Uses other sources as well   Version Control Systems (Git, Mercurial, Perforce, …)   Incident Systems (Jira, Bugtracker, ALM, …)   Communication Platforms (Stack Overflow, Intranets, …)   Build Servers (Jenkins, TeamCity, GO, …)   Review Tools (Swarm,   Organization Schemas   …   Adds time aspect Time
  • 15.
    MINING SOFTWARE REPOSITORIES  Mining software repositories gives us technical and social/ organizational information that we can’t derive from a snapshot.
  • 16.
    GOD CLASS ATFD >5 Access to foreign data WMC > 46 Weighted method count TCC > 0.33 Tight class cohesion God Class God classes are 4-17 times more defect prone God classes are 5-7 times more change prone
  • 17.
    DESIGN/TECHNICAL DEBT   Findingthe sweetspot   When do the cost of maintenance exceed the cost to refactor – Value of debt (how much is it going to cost to fix it?) – Interest rate (how much does it slow down development?) – Probability (what is the chance that the debt affects productivity?)
  • 18.
  • 19.
    CODE CHURN   Researchhas shown that frequent changes to complex code generally indicate declining quality   The number of times code changes is a better predictor of defects than pure size   Modules that change frequently are linked to maintenance problems and low quality (An Empirical Study on the Impact of Duplicate Code)   Including a measure of change in the prior release is an essential component of our fault prediction method. Individually, counts of adds and modifications outperform counts of deletes, while the sum of all three counts was most effective (Does Measuring Code Change Improve Fault Prediction?)
  • 20.
    CODE MAAT Command linetool to analyse VCS (Git, Mercurial, Subversion, Team Foundation Server, Perforce)   Input : VCS log file for the last X days/months/year(s)   Output : File Statistics (Number of files, age, …) Organizational Metrcis (number of authors, code ownership, …) Coupling Code Churn
  • 21.
  • 22.
    DEMO – CODECHURN Code Maat https://github.com/adamtornhill/code-maat Docker image https://github.com/peternorrhall/code-maat
  • 23.
    DEMO EMPEAR Code Maat+ Hotspots   Settings/Filtering and Visualization Performance Only support for Git
  • 24.
  • 25.
  • 26.
    FINDINGS   New moduleSelf Checkout Client (device integration)   A lot of development 2014   A lot of defects and refactorings in 2015 forthe files with highest code churn and complexity. In accordance with the result in Empear   XML complexity as well
  • 27.
    STREAMS   Task streamsfor larger work Purpose stable main main dev @ @ task
  • 28.
    TEMPORAL COUPLING Static codedependencies (Structure 101 on the Spring project)
  • 29.
    TEMPORAL COUPLING TestClassA ClassA ClassB Research •  Changecoupling points to architectural weakness •  Hotspots of refactoring candidates •  Helps comprehension of system modularization •  Spotting of misplaced components •  Correlates with defects (in some cases) Module A Module B
  • 30.
  • 31.
  • 32.
  • 33.
    TEMPORAL COUPLING –USE CASES Find patterns (.properties should be changed together) Find hidden dependencies (modules)   Lack of unit tests or too high velocity of unit tests Interesting to see how it changes over time
  • 34.
    ORGANIZATION AND OWNERSHIP Time Ownershipwhere person is about to leave or has left + Age of code
  • 35.
    WHAT IS YOURBUSINESS CASE?   Do you need to care about it in the first place?   How long will your product/system live?   Extenda 20+ years for some of our products   Data Scientist spend most of their time cleaning data Remove ”Build user”, Streams, … What type of commit – defect/refactor/new feature (explicit labeling works well) Finding the False Positives Use the metrics you have
  • 36.
    USE AND VISUALIZEYOUR DATA Free material from www.gapminder.org
  • 37.
    THE GOAL I wantto refactor the code! Why? Show me the numbers! No problem Boss!
  • 38.
  • 39.
    THANK YOU FORLISTENING! Please ask or give feedback   Email : peter.norrhall@extenda.com LinkedIn : https://www.linkedin.com/in/peternorrhall Twitter : https://twitter.com/peternorrhall
  • 40.
    REFERENCES   "Making Software,What Really Works, and Why We Believe It", Oram/Wilson   "Object-Oriented Metrics in Practice", Lanza/Marinescu   "Your Code as a Crime Scene", Tornhill   "Investigating the Impact of Design Debt on Software Quality", Zazworka/Seaman/Shull/Shaw   MSR International Conference - http://2016.msrconf.org/
  • 41.
    TOOLS Code Maat -https://github.com/adamtornhill/code-maat Code Maat Docker Image - https://github.com/peternorrhall/code-maat Docker - https://www.docker.com/ Empear – http://www.empear.com Graphviz – http://www.graphviz.org   Git - https://git-scm.com/   Git-P4 - https://git-scm.com/docs/git-p4   MS Excel - https://products.office.com/sv-se/excel Pentaho - http://community.pentaho.com/ Perforce - https://www.perforce.com/   R Studio - https://www.rstudio.com/ SonarQube - http://www.sonarqube.org/   Structure101 - http://structure101.com/