The document presents a study on providing just-in-time suggestions for log changes when developers make code changes. The researchers analyzed over 32,000 log changes from 4 systems. They found 20 reasons for log changes that fall into 4 categories: block changes, log improvements, dependence-driven changes, and logging issues. A random forest classifier using 25 software metrics related to code changes, history, and complexity achieved 0.84-0.91 AUC in predicting whether a log change is needed. Change metrics and product metrics were the most influential factors. The study aims to help developers make better logging decisions for failure diagnosis.
5. Missing an important log can significantly
increase the difficulty of failure diagnosis
5
Missing an
exception logging
causes an error
being swallowed
6. Logging excessively is not an optimal
solution
6
Large log
files
Hard to find real
errors
Hiding important
information
7. Failing to update logging statements
when changing the code
7
Source code only
checks one invalid
item, while log
says all invalid
items are checked
8. We want to provide logging suggestions
when developers make code changes
8
Code
changes
Do we need
to make log
changes?
9. Towards just-in-time suggestions for
log changes
9
RQ1. What are the reasons for changing
logging statements?
RQ2. How well can we provide just-in-
time log change suggestions?
RQ3. What are the influential factors
that explain log changes?
10. Four case study systems
10
Over 2 M lines of code
Over 13 K logging statements
Over 40 K Commits
23% - 30% commits involve log
changes
11. Towards just-in-time suggestions for
log changes
11
RQ1. What are the reasons for changing
logging statements?
RQ2. How well can we provide just-in-
time log change suggestions?
RQ3. What are the influential factors
that explain log changes?
12. We manually coded the reasons for
log changes
Random sampling of 380 log
changes out of 32K log changes
12
Manual examined code changes,
commit messages, and issue
reports
Open coding approach
13. We find 20 reasons for log changes across
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
13
Changing a try-catch block
Changing a conditional (if/switch)
branch
…
Percentage
Block
change
Log
improvemen
t
Dependence
-driven
change
Logging
issue
14. We find 20 reasons for log changes across
four categories
14
Improving debugging capability
Improving readability
…
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
Percentage
Block
change
Log
improvemen
t
Dependence
-driven
change
Logging
issue
15. We find 20 reasons for log changes across
four categories
15
Logger change
Variable change
…
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
Percentage
Block
change
Log
improvemen
t
Dependence
-driven
change
Logging
issue
16. We find 20 reasons for log changes across
four categories
16
Inappropriate log level
Inappropriate log text
…
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
Percentage
Block
change
Log
improvemen
t
Dependence
-driven
change
Logging
issue
17. Towards just-in-time suggestions for
log changes
17
RQ1. What are the reasons for changing
logging statements?
RQ2. How well can we provide just-in-
time log change suggestions?
RQ3. What are the influential factors
that explain log changes?
18. We derive 25 software metrics that are
related to log changes
18
Change metrics
Capture the code
changes in a
commit, e.g., the
number of changed
catch blocks
Historical metrics
Capture the code
changes in the
history, e.g., the
number of previous
log changes
Product metrics
Capture the
snapshot of the
source code, e.g.,
code complexity
metrics
19. We derive 25 software metrics that are
related to log changes
19
Change metrics
Capture the code
changes in a
commit, e.g., the
number of changed
catch blocks
Historical metrics
Capture the code
changes in the
history, e.g., the
number of previous
log changes
Product metrics
Capture the
snapshot of the
source code, e.g.,
code complexity
metrics
20. We derive 25 software metrics that are
related to log changes
20
Change metrics
Capture the code
changes in a
commit, e.g., the
number of changed
catch blocks
Historical metrics
Capture the code
changes in the
history, e.g., the
number of previous
log changes
Product metrics
Capture the
snapshot of the
source code, e.g.,
code complexity
metrics
21. We use these software metrics as
explanatory variables for log changes
21
Three
dimensions
25 metrics
22. We use a random forest classifier
to model log changes
22
Random Forest
Classifier
Log change
suggestion
s
Three
dimensions
25 metrics
23. We use a random forest classifier
to model log changes
23
Random Forest
Classifier
Log change
suggestion
s
Three
dimensions
25 metrics Robust against overfitting (using bootstrap samples to
construct decision trees)
Measure variable importance through variable permutation
24. Our models can effectively suggest
whether a log change is needed
24
0.84
0.91
0.86 0.88
0.4
0.5
0.6
0.7
0.8
0.9
1
Random
guess
AUC
The performance (AUC) of a within-project
evaluation
25. Our models can effectively suggest
whether a log change is needed
25
0.84
0.91
0.86 0.88
0.84
0.88 0.86 0.87
0.4
0.5
0.6
0.7
0.8
0.9
1
Within-project Cross-project
AUC
The performance (AUC) of a cross-project
evaluation
26. Towards just-in-time suggestions for
log changes
26
RQ1. What are the reasons for changing
logging statements?
RQ2. How well can we provide just-in-
time log change suggestions?
RQ3. What are the influential factors
that explain log changes?
27. Metrics are clustered into statistically
distinct groups based on their importance
27
1,000 random
forest
classifiers
Scott-Knott ClusteringBootstrapping
Statistically ranked
variable importance
Raw variable
importance
values
Clustering variables into
statistically distinct
groups
Metric Imp 1 Imp 2 …
X1 …
X2 …
... ... …
Rank
group
Metric
1 X1, X5, X10
2 X3, X6
... ...
28. Change measures and product measures
are the most influential factors for log
changes
28
Change metrics
Capture the code
changes in a
commit, e.g., the
number of changed
catch blocks
Historical metrics
Capture the code
changes in the
history, e.g., the
number of previous
log changes
Product metrics
Capture the
snapshot of the
source code, e.g.,
code complexity
metrics
31. We find 20 reasons for log changes across
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
§ Changing a try-catch block
§ Changing a conditional (if/ switch) branch
§ …
Percentage
Block
change
Log
improvement
Dependence-
driven change
Logging
issue
32. Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
Making good logging decisions is critical for
failure diagnosis
Logs are usually the only resource
for debugging field failures
32
We find 20 reasons for log changes across
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
§ Changing a try-catch block
§ Changing a conditional (if/ switch) branch
§ …
Percentage
Block
change
Log
improvement
Dependence-
driven change
Logging
issue
33. We use a random forest classifier
to model log changes
Random Forest
Classifier
Log change
suggestions
Three dimensions
25 metrics
34. Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
Making good logging decisions is critical for
failure diagnosis
Logs are usually the only resource
for debugging field failures
34
We find 20 reasons for log changes across
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
§ Changing a try-catch block
§ Changing a conditional (if/ switch) branch
§ …
Percentage
Block
change
Log
improvement
Dependence-
driven change
Logging
issue
We use a random forest classifier
to model log changes
Random Forest
Classifier
Log change
suggestions
Three dimensions
25 metrics
35. Metrics are clustered into statistically distinct
groups based on their importance
1,000 random
forest classifiers
Scott-Knott ClusteringBootstrapping
Statistically ranked
variable importance
Raw variable
importance values
Clustering variables into
statistically distinct groups
M etric Imp 1 Imp 2 …
X1 …
X2 …
... ... …
Ra nk
group
M etric
1 X1, X5, X10
2 X3, X6
... ...
36. Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
Making good logging decisions is critical for
failure diagnosis
Logs are usually the only resource
for debugging field failures
36
We find 20 reasons for log changes across
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
§ Changing a try-catch block
§ Changing a conditional (if/ switch) branch
§ …
Percentage
Block
change
Log
improvement
Dependence-
driven change
Logging
issue
We use a random forest classifier
to model log changes
Random Forest
Classifier
Log change
suggestions
Three dimensions
25 metrics
Metrics are clustered into statistically distinct
groups based on their importance
1,000 random
forest classifiers
Scott-Knott ClusteringBootstrapping
Statistically ranked
variable importance
Raw variable
importance values
Clustering variables into
statistically distinct groups
M etric Imp 1 Imp 2 …
X1 …
X2 …
... ... …
Ra nk
group
M etric
1 X1, X5, X10
2 X3, X6
... ...
37. Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
Making good logging decisions is critical for
failure diagnosis
Logs are usually the only resource
for debugging field failures
37
We find 20 reasons for log changes across
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
§ Changing a try-catch block
§ Changing a conditional (if/ switch) branch
§ …
Percentage
Block
change
Log
improvement
Dependence-
driven change
Logging
issue
We use a random forest classifier
to model log changes
Random Forest
Classifier
Log change
suggestions
Three dimensions
25 metrics
Metrics are clustered into statistically distinct
groups based on their importance
1,000 random
forest classifiers
Scott-Knott ClusteringBootstrapping
Statistically ranked
variable importance
Raw variable
importance values
Clustering variables into
statistically distinct groups
M etric Imp 1 Imp 2 …
X1 …
X2 …
... ... …
Ra nk
group
M etric
1 X1, X5, X10
2 X3, X6
... ...
http://hengli.org
hengli@cs.queensu.ca