Software Analytics
for Pragmatists
Solving problems –
automated, data-centric and reproducible
Markus Harrer
software analytics clean code
code
“Without data you're
just another person
with an opinion.“
W. Edwards Deming
Motivation Software Analytics
“In software engineering
there is much we are seeing,
but little we are learning.”
Tim Menzies
State of the Art
State of the Art
Questions from daily work
Motivation
Why?
•Make problems visible
+ Improve clarity and understanding
•Drive decisions
+ Raise money
+ Raise more money for further analysis
•Support Continuous Learning
+ Master challenges
+ Thrive for improvement steadily
Claim
Solving problems –
automated,
data-centric and
reproducible.
Claim
Solving problems –
automated,
data-centric and
reproducible.
Automation
•Data retrieval
•Analysis
•Visualization
DevOps style
Claim
Solving problems –
automated,
data-centric and
reproducible.
Types of software artifacts and meta data
chronologicalcommunity
runtimestatic
Data is dirty
Time for cleaning up data
up to 80%
Claim
Solving problems –
automated,
data-centric and
reproducible.
Data + Automation + Traceability
Replication
Code Book
Example:
Analyzing performance bottlenecks
Claim
Solving problems –
automated,
data-centric and
reproducible.
Pipeline
Data Mining
• NumPy
• scikit-learn
• SciPy
Visualization
• matplotlib
• plot.ly
• Bokeh
• python-pptx
...
Pipeline
XML/Graph
Tabellen
matplotlib
Pandas,
...
Pandas
jQAssistant,
Neo4j
Text
xlsx
E
pptx
P
Python Jupyter
Input
Pre-
processing
Analysis
Output
D3
Pipeline
ZIP
GZ
*.class
JAR, WAR, EAR
MANIFEST.MF
*.properties
XSD
YAML
XML
application.xml
web.xml
beans.xml
JaCoCo
FindBugs
CheckStyle
pom.xml
surefire-reports.xml RDBMS Schema
M2 Repository
DBCSV
Excel
BigQuery
Inputs
HDFStore
Web
JSON
Git
Pandas
jQAssistant
Examples
Demos
•Performance Bottlenecks
•Build Breaker Analysis
•Mining Knowledge Islands
•Git Log Analysis
Experience
Learning by Doing!
What‘s the
value of the information
related to the
effort for the analysis
of the information?
Always ask the question
References
Leek, Jeff: The Elements of Data Analytic Style.
LeanPub, 2015.
McKinney, Wes: Python For Data Analysis, O’Reilly,
2012.
Mens, Tom; Serebrenik; Cleve, Anthony:Evolving
Software Systems. Springer, 2014.
Mens, Tom; Demeyer, Serge: Software Evolution.
Springer, 2008.
Shull, Forrest; Singer, Janice; Sjøberg, Dag I.K.: Guide
to Advanced Empirical Software Engineering. Springer,
2008.
Tornhill, Adam:Your Code As a Crime Scene.
Pragmatic Programmers, 2015.
Your Questions

Software Analytics for Pragmatists [DevOps Camp 2017]