Mine Your Own Code

How do I know
what to
refactor?

BACKGROUND
•  Married for 25+ years
•  Working as software developer/architect for 25 years
•  Weighs 20+ kg more
•  Will work 15(+/-?) more years trying to answer that question

HOW SOFTWARE EVOLVE OVER TIME
Time
Very often becomes

POTENTIAL INDICATORS
  Code Decay
  Lack of design patterns
  Architecture violations
Code Smell
  God Class
  Data Class
  Code Clones
  Tradition Breaker
  Intensive Coupling
  …

SHOW ME THE NUMBERS
I want to refactor
the code! Why? What’s the
business value?
Show me the
numbers!
?

FIND THE HOTSPOTS
  Problems
  Degraded velocity
  Long and breaking builds
  Increasing size and complexity
  SLA issues
  Diagnose/Action
  Test
  Code Review
  Refactor

LARGE/GOD CLASS
ATFD > 5
Access to foreign data
WMC > 46
Weighted method count
TCC > 0.33
Tight class cohesion
God Class

MINING SOFTWARE REPOSITORIES
  Uses other sources as well
  Version Control Systems (Git, Mercurial, Perforce, …)
  Incident Systems (Jira, Bugtracker, ALM, …)
  Communication Platforms (Stack Overflow, Intranets, …)
  Build Servers (Jenkins, TeamCity, GO, …)
  Review Tools (Swarm,
  Organization Schemas
  …
  Adds time aspect
Time

MINING SOFTWARE REPOSITORIES
  Mining software repositories gives us technical and social/
organizational information that we can’t derive from a snapshot.

GOD CLASS
ATFD > 5
Access to foreign data
WMC > 46
Weighted method count
TCC > 0.33
Tight class cohesion
God Class
God classes are 4-17 times more defect prone
God classes are 5-7 times more change prone

DESIGN/TECHNICAL DEBT
  Finding the sweetspot
  When do the cost of maintenance exceed the cost to refactor
– Value of debt (how much is it going to cost to fix it?)
– Interest rate (how much does it slow down development?)
– Probability (what is the chance that the debt affects productivity?)

CODE CHURN
  Research has shown that frequent changes to complex code
generally indicate declining quality
  The number of times code changes is a better predictor of defects
than pure size
  Modules that change frequently are linked to maintenance problems
and low quality (An Empirical Study on the Impact of Duplicate Code)
  Including a measure of change in the prior release is an essential
component of our fault prediction method. Individually, counts of adds
and modifications outperform counts of deletes, while the sum of all
three counts was most effective (Does Measuring Code Change
Improve Fault Prediction?)

CODE MAAT
Command line tool to analyse VCS (Git, Mercurial, Subversion,
Team Foundation Server, Perforce)
  Input : VCS log file for the last X days/months/year(s)
  Output :
File Statistics (Number of files, age, …)
Organizational Metrcis (number of authors, code ownership, …)
Coupling
Code Churn

DEMO – CODE CHURN
Code Maat
https://github.com/adamtornhill/code-maat
Docker image https://github.com/peternorrhall/code-maat

DEMO EMPEAR
Code Maat +
Hotspots
  Settings/Filtering and Visualization
Performance
Only support for Git

EXTENDA - MSR
Time
Changelist/Files
Job
Defect
Requirement
Refactoring
Code Analysis
Categorisation

EXTENDA MSR
Pentaho
Data Integration

FINDINGS
  New module Self Checkout Client (device integration)
  A lot of development 2014
  A lot of defects and refactorings in 2015 forthe files with highest
code churn and complexity. In accordance with the result in
Empear
  XML complexity as well

STREAMS
  Task streams for larger work
Purpose stable main
main
dev
@
@
task

TEMPORAL COUPLING
Static code dependencies (Structure 101 on the Spring project)

TEMPORAL COUPLING
TestClassA
ClassA
ClassB
Research
•  Change coupling points to architectural weakness
•  Hotspots of refactoring candidates
•  Helps comprehension of system modularization
•  Spotting of misplaced components
•  Correlates with defects (in some cases)
Module A
Module B

TEMPORAL COUPLING – USE CASES
Find patterns (.properties should be changed together)
Find hidden dependencies (modules)
  Lack of unit tests or too high velocity of unit tests
Interesting to see how it changes over time

ORGANIZATION AND OWNERSHIP
Time
Ownership where person is about to leave or has left + Age of code

WHAT IS YOUR BUSINESS CASE?
  Do you need to care about it in the first place?
  How long will your product/system live?
  Extenda 20+ years for some of our products
  Data Scientist spend most of their time cleaning data
Remove ”Build user”, Streams, …
What type of commit – defect/refactor/new feature (explicit labeling
works well)
Finding the False Positives
Use the metrics you have

USE AND VISUALIZE YOUR DATA
Free material from www.gapminder.org

THE GOAL
I want to refactor
the code!
Why? Show me
the numbers!
No problem
Boss!

THANK YOU FOR LISTENING!
Please ask or give feedback
  Email : peter.norrhall@extenda.com
LinkedIn : https://www.linkedin.com/in/peternorrhall
Twitter : https://twitter.com/peternorrhall

REFERENCES
  "Making Software, What Really Works, and Why We Believe
It", Oram/Wilson
  "Object-Oriented Metrics in Practice", Lanza/Marinescu
  "Your Code as a Crime Scene", Tornhill
  "Investigating the Impact of Design Debt on Software Quality",
Zazworka/Seaman/Shull/Shaw
  MSR International Conference - http://2016.msrconf.org/

TOOLS
Code Maat - https://github.com/adamtornhill/code-maat
Code Maat Docker Image - https://github.com/peternorrhall/code-maat
Docker - https://www.docker.com/
Empear – http://www.empear.com
Graphviz – http://www.graphviz.org
  Git - https://git-scm.com/
  Git-P4 - https://git-scm.com/docs/git-p4
  MS Excel - https://products.office.com/sv-se/excel
Pentaho - http://community.pentaho.com/
Perforce - https://www.perforce.com/
  R Studio - https://www.rstudio.com/
SonarQube - http://www.sonarqube.org/
  Structure101 - http://structure101.com/

Mine Your Own Code

More Related Content

What's hot

Similar to Mine Your Own Code

Recently uploaded

Mine Your Own Code