Towards Just-in-Time Suggestions for Log Changes

Towards Just-in-Time Suggestions
for
Log Changes
Journal-first Presentation | Empirical Software
Engineering
Heng Li Weiyi Shang Ying Zou Ahmed E.
Hassan

2
Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegE
x
[^…]
Logger.error()

3
Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegE
x
[^…]
Logger.error()
Logs are usually the only resource
for debugging field failures

4
Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegE
x
[^…]
Logger.error()
Making good logging decisions is critical for
failure diagnosis

Missing an important log can significantly
increase the difficulty of failure diagnosis
5
Missing an
exception logging
causes an error
being swallowed

Logging excessively is not an optimal
solution
6
Large log
files
Hard to find real
errors
Hiding important
information

Failing to update logging statements
when changing the code
7
Source code only
checks one invalid
item, while log
says all invalid
items are checked

We want to provide logging suggestions
when developers make code changes
8
Code
changes
Do we need
to make log
changes?

Towards just-in-time suggestions for
log changes
9
RQ1. What are the reasons for changing
logging statements?
RQ2. How well can we provide just-in-
time log change suggestions?
RQ3. What are the influential factors
that explain log changes?

Four case study systems
10
Over 2 M lines of code
Over 13 K logging statements
Over 40 K Commits
23% - 30% commits involve log
changes

log changes
11
logging statements?

We manually coded the reasons for
log changes
Random sampling of 380 log
changes out of 32K log changes
12
Manual examined code changes,
commit messages, and issue
reports
Open coding approach

We find 20 reasons for log changes across
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
13
 Changing a try-catch block
 Changing a conditional (if/switch)
branch
 …
Percentage
Block
change
Log
improvemen
t
Dependence
-driven
change
Logging
issue

four categories
14
 Improving debugging capability
 Improving readability
 …
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
Percentage
Block
change
Log
improvemen
t
Dependence
-driven
change
Logging
issue

four categories
15
 Logger change
 Variable change
 …
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
Percentage
Block
change
Log
improvemen
t
Dependence
-driven
change
Logging
issue

four categories
16
 Inappropriate log level
 Inappropriate log text
 …
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
Percentage
Block
change
Log
improvemen
t
Dependence
-driven
change
Logging
issue

log changes
17
logging statements?

We derive 25 software metrics that are
related to log changes
18
Change metrics
Capture the code
changes in a
commit, e.g., the
number of changed
catch blocks
Historical metrics
Capture the code
changes in the
history, e.g., the
number of previous
log changes
Product metrics
Capture the
snapshot of the
source code, e.g.,
code complexity
metrics

19
Change metrics
Capture the code
changes in a
commit, e.g., the
number of changed
catch blocks
Historical metrics
Capture the code
changes in the
history, e.g., the
number of previous
log changes
Product metrics
Capture the
snapshot of the
source code, e.g.,
code complexity
metrics

20
Change metrics
Capture the code
changes in a
commit, e.g., the
number of changed
catch blocks
Historical metrics
Capture the code
changes in the
history, e.g., the
number of previous
log changes
Product metrics
Capture the
snapshot of the
source code, e.g.,
code complexity
metrics

We use these software metrics as
explanatory variables for log changes
21
Three
dimensions
25 metrics

We use a random forest classifier
to model log changes
22
Random Forest
Classifier
Log change
suggestion
s
Three
dimensions
25 metrics

23
Random Forest
Classifier
Log change
suggestion
s
Three
dimensions
25 metrics Robust against overfitting (using bootstrap samples to
construct decision trees)
 Measure variable importance through variable permutation

Our models can effectively suggest
whether a log change is needed
24
0.84
0.91
0.86 0.88
0.4
0.5
0.6
0.7
0.8
0.9
1
Random
guess
AUC
The performance (AUC) of a within-project
evaluation

Our models can effectively suggest
whether a log change is needed
25
0.84
0.91
0.86 0.88
0.84
0.88 0.86 0.87
0.4
0.5
0.6
0.7
0.8
0.9
1
Within-project Cross-project
AUC
The performance (AUC) of a cross-project
evaluation

log changes
26
logging statements?

Metrics are clustered into statistically
distinct groups based on their importance
27
1,000 random
forest
classifiers
Scott-Knott ClusteringBootstrapping
Statistically ranked
variable importance
Raw variable
importance
values
Clustering variables into
statistically distinct
groups
Metric Imp 1 Imp 2 …
X1 …
X2 …
... ... …
Rank
group
Metric
1 X1, X5, X10
2 X3, X6
... ...

Change measures and product measures
are the most influential factors for log
changes
28
Change metrics
Capture the code
changes in a
commit, e.g., the
number of changed
catch blocks
Historical metrics
Capture the code
changes in the
history, e.g., the
number of previous
log changes
Product metrics
Capture the
snapshot of the
source code, e.g.,
code complexity
metrics

Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
failure diagnosis

Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
failure diagnosis
30

four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
§ Changing a try-catch block
§ Changing a conditional (if/ switch) branch
§ …
Percentage
Block
change
Log
improvement
Dependence-
driven change
Logging
issue

Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
failure diagnosis
32
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
§ …
Percentage
Block
change
Log
improvement
Dependence-
driven change
Logging
issue

Random Forest
Classifier
Log change
suggestions
Three dimensions
25 metrics

Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
failure diagnosis
34
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
§ …
Percentage
Block
change
Log
improvement
Dependence-
driven change
Logging
issue
Random Forest
Classifier
Log change
suggestions
Three dimensions
25 metrics

Metrics are clustered into statistically distinct
groups based on their importance
1,000 random
forest classifiers
variable importance
Raw variable
importance values
statistically distinct groups
M etric Imp 1 Imp 2 …
X1 …
X2 …
... ... …
Ra nk
group
M etric
1 X1, X5, X10
2 X3, X6
... ...

Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
failure diagnosis
36
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
§ …
Percentage
Block
change
Log
improvement
Dependence-
driven change
Logging
issue
Random Forest
Classifier
Log change
suggestions
Three dimensions
25 metrics
1,000 random
forest classifiers
variable importance
Raw variable
importance values
X1 …
X2 …
... ... …
Ra nk
group
M etric
1 X1, X5, X10
2 X3, X6
... ...

Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
failure diagnosis
37
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
§ …
Percentage
Block
change
Log
improvement
Dependence-
driven change
Logging
issue
Random Forest
Classifier
Log change
suggestions
Three dimensions
25 metrics
1,000 random
forest classifiers
variable importance
Raw variable
importance values
X1 …
X2 …
... ... …
Ra nk
group
M etric
1 X1, X5, X10
2 X3, X6
... ...
http://hengli.org
hengli@cs.queensu.ca

Towards Just-in-Time Suggestions for Log Changes

Recommended

Recommended

More Related Content

Similar to Towards Just-in-Time Suggestions for Log Changes

Similar to Towards Just-in-Time Suggestions for Log Changes (20)

More from SAIL_QU

More from SAIL_QU (20)

Recently uploaded

Recently uploaded (20)

Towards Just-in-Time Suggestions for Log Changes