ScalaClean
–Unknown
“A spotless house is a sign of a misspent life...”
https://github.com/rorygraves/scalaclean
–Zoë Weil
“Code is not just syntactic - you need to
understand deep complex architectures,
dependencies and codebases.”
Status
Vaporware
Completely Broken
Probably Broken
Caveat Emptor!
Current State
• Its runnable on your project.

• It will probably break your code

• Testing framework in place - now playing ‘hunt the bug’
Software Quality
• Maintaining quality in an ever changing project

• Debt continually accrues

• Measuring it is hard

• Paying it back can be hard

• It slows you down
Is writing quality software
important?
• Martin Fowler thinks so:

• https://martinfowler.com/articles/is-quality-
worth-cost.html

• People have been thinking about it for a long
time.
How do we keep code
clean?
• Conscious design

• Not reinventing the wheel

• Continuous improvement

• Design

• House cleaning
Surely there is…
• ScalaStyle

• WartRemover

• Scapegoat (https://github.com/sksamuel/scapegoat)

• Linter (https://github.com/HairyFotr/linter)

• Scalac -Xlint 

• Intellij code Inspections 

• ScalaFix (https://github.com/scalacenter/scalafix)

• Scoverage
What are they checking
Summary
• Lots of rules (100s)

• Local errors and smells

• null use, unsafe patterns, invalid comparisons

• They can be quite opinionated

• The will find things wrong with your code!
So what is wrong?
• Local information only

• Cannot answer:

• who calls foo( )?

• who extends this?

• Back links need to be tracked at source and captured.
It’s a graph problem
• Generate the full program graph (calls, inheritance etc)

• Colour and walk the graph to discover

• who calls foo( ) (find usages)?

• what is the inheritance hierarchy of A?

• what does this method override?

• Back links need to be tracked at source and captured.
It’s a graph problem
It’s a graph problem
ScalaClean - the plan
• Capture the dependency graph for a whole project

• Use that analysis to do things
The closed world
assumption
• All entry or extension points inferred or marked.

• Main methods

• Annotations

• Rules/Regex
Capturing the information
• SemanticDB/ScalaMeta

• Compiler Plugin

• Scala Reflection / Java Reflection
So what can we do?
• Once you have the graph you can do many things:

• Dead code detection

• Privacy scoping

• Build profile analysis

• Other code cleanups
Dead code analysis
(and removal)
• Finding dead code is hard to do manually

• Used method, unused class

• circular references

• long chains

• Easier in pure functional code
Dead code analysis
(and removal)
• It is a graph colouring problem.

• Colour from the entry points and apis

• and possibly tests

• Anything not coloured is unreachable and dead.

• Compare test reachability vs entry reachability
Dead code analysis
(and removal)
• Colouring
Dead code analysis
(and removal)
Colouring
Dead code analysis
(and removal)
Dead code analysis
(and removal)
Dead code analysis
Limitations
• Difficult to automatically detecting all entry points

• Reflection

• API Entry points (annotate)

• Solution - annotation or config
Privatisation
• Reduce the scope of code to minimise visibility (private/
projected etc).

• Why?

• Its good practice 

• code completion 

• Incremental compilation

• pipelined compilation
Privatisation - 1
• How does privatisation help code completion?

• Hiding methods/fields in imported classes highlights
the public api (they will be suggested first)
Privatisation - 2
• How does privatisation help code completion?

• Hiding methods/fields in imported classes highlights
the public api (they will be suggested first)
Privatisation - 3
• How does privatisation help incremental compilation?

• Fewer dependencies between files

• fewer invalidations
Privatisation - 4
• Enabled faster compilation

• Outline typing - do enough typing to compile
downstream

• Privatisation minimises the outline and work 

• Esp. if public methods have types ascribed.
Privatisation - 4
Privatisation - 5
Privatisation - 6
Privatisation - 7
Build diamonds
Code improvements
• Remove unused parameters

• remove constant parameters

• Relax types
Code improvements
• Diego - https://github.com/rorygraves/ScalaClean/
issues/25

• Relax parameters to supertypes.

• If B can be replaced with super interface - move it to
A

• “Some "proverbs" of engineering, to which this rule
help, are those of programming against an interface,
not an implementation; as well as the old minimum
access.”
Observations
• Tooling is hard

• Underneath it all we are modelling a complex language

• Data not in consistent forms

• No canonical representation

• Reverse engineering missing information is painful
What next?
• Continuing development

• Finding the bugs

• Fixing the bugs

• Testing it on big projects

• Seeing where the data leads us….
Conclusion
• Whole program static analysis is possible leads to some
interesting possibilities

• ScalaClean is not yet a turnkey solution

• Lots of work to do - lots of possibilities
–Mike Krieger (founder of instagram)
“Software is like gardening - one day I'll go behind
the shed and clean up. But if nobody ever goes
there, does it matter a lot?”
Questions?
“Except, its not the garden, it is the house we are
renovating whilst living in it. And that weird issue
with the drain is going to keep hurting us till we fix
it.”
–Rory

ScalaClean at ScalaSphere 2019

  • 1.
    ScalaClean –Unknown “A spotless houseis a sign of a misspent life...” https://github.com/rorygraves/scalaclean
  • 2.
    –Zoë Weil “Code isnot just syntactic - you need to understand deep complex architectures, dependencies and codebases.”
  • 3.
  • 5.
    Current State • Itsrunnable on your project. • It will probably break your code • Testing framework in place - now playing ‘hunt the bug’
  • 6.
    Software Quality • Maintainingquality in an ever changing project • Debt continually accrues • Measuring it is hard • Paying it back can be hard • It slows you down
  • 7.
    Is writing qualitysoftware important? • Martin Fowler thinks so: • https://martinfowler.com/articles/is-quality- worth-cost.html • People have been thinking about it for a long time.
  • 8.
    How do wekeep code clean? • Conscious design • Not reinventing the wheel • Continuous improvement • Design • House cleaning
  • 9.
    Surely there is… •ScalaStyle • WartRemover • Scapegoat (https://github.com/sksamuel/scapegoat) • Linter (https://github.com/HairyFotr/linter) • Scalac -Xlint • Intellij code Inspections • ScalaFix (https://github.com/scalacenter/scalafix) • Scoverage
  • 10.
    What are theychecking Summary • Lots of rules (100s) • Local errors and smells • null use, unsafe patterns, invalid comparisons • They can be quite opinionated • The will find things wrong with your code!
  • 11.
    So what iswrong? • Local information only • Cannot answer: • who calls foo( )? • who extends this? • Back links need to be tracked at source and captured.
  • 12.
    It’s a graphproblem • Generate the full program graph (calls, inheritance etc) • Colour and walk the graph to discover • who calls foo( ) (find usages)? • what is the inheritance hierarchy of A? • what does this method override? • Back links need to be tracked at source and captured.
  • 13.
  • 14.
  • 15.
    ScalaClean - theplan • Capture the dependency graph for a whole project • Use that analysis to do things
  • 16.
    The closed world assumption •All entry or extension points inferred or marked. • Main methods • Annotations • Rules/Regex
  • 17.
    Capturing the information •SemanticDB/ScalaMeta • Compiler Plugin • Scala Reflection / Java Reflection
  • 18.
    So what canwe do? • Once you have the graph you can do many things: • Dead code detection • Privacy scoping • Build profile analysis • Other code cleanups
  • 19.
    Dead code analysis (andremoval) • Finding dead code is hard to do manually • Used method, unused class • circular references • long chains • Easier in pure functional code
  • 20.
    Dead code analysis (andremoval) • It is a graph colouring problem. • Colour from the entry points and apis • and possibly tests • Anything not coloured is unreachable and dead. • Compare test reachability vs entry reachability
  • 21.
    Dead code analysis (andremoval) • Colouring
  • 22.
    Dead code analysis (andremoval) Colouring
  • 23.
  • 24.
  • 25.
    Dead code analysis Limitations •Difficult to automatically detecting all entry points • Reflection • API Entry points (annotate) • Solution - annotation or config
  • 26.
    Privatisation • Reduce thescope of code to minimise visibility (private/ projected etc). • Why? • Its good practice • code completion • Incremental compilation • pipelined compilation
  • 27.
    Privatisation - 1 •How does privatisation help code completion? • Hiding methods/fields in imported classes highlights the public api (they will be suggested first)
  • 28.
    Privatisation - 2 •How does privatisation help code completion? • Hiding methods/fields in imported classes highlights the public api (they will be suggested first)
  • 29.
    Privatisation - 3 •How does privatisation help incremental compilation? • Fewer dependencies between files • fewer invalidations
  • 30.
    Privatisation - 4 •Enabled faster compilation • Outline typing - do enough typing to compile downstream • Privatisation minimises the outline and work • Esp. if public methods have types ascribed.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
    Code improvements • Removeunused parameters • remove constant parameters • Relax types
  • 37.
    Code improvements • Diego- https://github.com/rorygraves/ScalaClean/ issues/25 • Relax parameters to supertypes. • If B can be replaced with super interface - move it to A • “Some "proverbs" of engineering, to which this rule help, are those of programming against an interface, not an implementation; as well as the old minimum access.”
  • 38.
    Observations • Tooling ishard • Underneath it all we are modelling a complex language • Data not in consistent forms • No canonical representation • Reverse engineering missing information is painful
  • 39.
    What next? • Continuingdevelopment • Finding the bugs • Fixing the bugs • Testing it on big projects • Seeing where the data leads us….
  • 40.
    Conclusion • Whole programstatic analysis is possible leads to some interesting possibilities • ScalaClean is not yet a turnkey solution • Lots of work to do - lots of possibilities
  • 41.
    –Mike Krieger (founderof instagram) “Software is like gardening - one day I'll go behind the shed and clean up. But if nobody ever goes there, does it matter a lot?” Questions? “Except, its not the garden, it is the house we are renovating whilst living in it. And that weird issue with the drain is going to keep hurting us till we fix it.” –Rory