SlideShare a Scribd company logo
Towards Just-in-Time Suggestions
for
Log Changes
Journal-first Presentation | Empirical Software
Engineering
Heng Li Weiyi Shang Ying Zou Ahmed E.
Hassan
2
Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegE
x
[^…]
Logger.error()
3
Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegE
x
[^…]
Logger.error()
Logs are usually the only resource
for debugging field failures
4
Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegE
x
[^…]
Logger.error()
Making good logging decisions is critical for
failure diagnosis
Logs are usually the only resource
for debugging field failures
Missing an important log can significantly
increase the difficulty of failure diagnosis
5
Missing an
exception logging
causes an error
being swallowed
Logging excessively is not an optimal
solution
6
Large log
files
Hard to find real
errors
Hiding important
information
Failing to update logging statements
when changing the code
7
Source code only
checks one invalid
item, while log
says all invalid
items are checked
We want to provide logging suggestions
when developers make code changes
8
Code
changes
Do we need
to make log
changes?
Towards just-in-time suggestions for
log changes
9
RQ1. What are the reasons for changing
logging statements?
RQ2. How well can we provide just-in-
time log change suggestions?
RQ3. What are the influential factors
that explain log changes?
Four case study systems
10
Over 2 M lines of code
Over 13 K logging statements
Over 40 K Commits
23% - 30% commits involve log
changes
Towards just-in-time suggestions for
log changes
11
RQ1. What are the reasons for changing
logging statements?
RQ2. How well can we provide just-in-
time log change suggestions?
RQ3. What are the influential factors
that explain log changes?
We manually coded the reasons for
log changes
Random sampling of 380 log
changes out of 32K log changes
12
Manual examined code changes,
commit messages, and issue
reports
Open coding approach
We find 20 reasons for log changes across
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
13
 Changing a try-catch block
 Changing a conditional (if/switch)
branch
 …
Percentage
Block
change
Log
improvemen
t
Dependence
-driven
change
Logging
issue
We find 20 reasons for log changes across
four categories
14
 Improving debugging capability
 Improving readability
 …
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
Percentage
Block
change
Log
improvemen
t
Dependence
-driven
change
Logging
issue
We find 20 reasons for log changes across
four categories
15
 Logger change
 Variable change
 …
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
Percentage
Block
change
Log
improvemen
t
Dependence
-driven
change
Logging
issue
We find 20 reasons for log changes across
four categories
16
 Inappropriate log level
 Inappropriate log text
 …
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
Percentage
Block
change
Log
improvemen
t
Dependence
-driven
change
Logging
issue
Towards just-in-time suggestions for
log changes
17
RQ1. What are the reasons for changing
logging statements?
RQ2. How well can we provide just-in-
time log change suggestions?
RQ3. What are the influential factors
that explain log changes?
We derive 25 software metrics that are
related to log changes
18
Change metrics
Capture the code
changes in a
commit, e.g., the
number of changed
catch blocks
Historical metrics
Capture the code
changes in the
history, e.g., the
number of previous
log changes
Product metrics
Capture the
snapshot of the
source code, e.g.,
code complexity
metrics
We derive 25 software metrics that are
related to log changes
19
Change metrics
Capture the code
changes in a
commit, e.g., the
number of changed
catch blocks
Historical metrics
Capture the code
changes in the
history, e.g., the
number of previous
log changes
Product metrics
Capture the
snapshot of the
source code, e.g.,
code complexity
metrics
We derive 25 software metrics that are
related to log changes
20
Change metrics
Capture the code
changes in a
commit, e.g., the
number of changed
catch blocks
Historical metrics
Capture the code
changes in the
history, e.g., the
number of previous
log changes
Product metrics
Capture the
snapshot of the
source code, e.g.,
code complexity
metrics
We use these software metrics as
explanatory variables for log changes
21
Three
dimensions
25 metrics
We use a random forest classifier
to model log changes
22
Random Forest
Classifier
Log change
suggestion
s
Three
dimensions
25 metrics
We use a random forest classifier
to model log changes
23
Random Forest
Classifier
Log change
suggestion
s
Three
dimensions
25 metrics Robust against overfitting (using bootstrap samples to
construct decision trees)
 Measure variable importance through variable permutation
Our models can effectively suggest
whether a log change is needed
24
0.84
0.91
0.86 0.88
0.4
0.5
0.6
0.7
0.8
0.9
1
Random
guess
AUC
The performance (AUC) of a within-project
evaluation
Our models can effectively suggest
whether a log change is needed
25
0.84
0.91
0.86 0.88
0.84
0.88 0.86 0.87
0.4
0.5
0.6
0.7
0.8
0.9
1
Within-project Cross-project
AUC
The performance (AUC) of a cross-project
evaluation
Towards just-in-time suggestions for
log changes
26
RQ1. What are the reasons for changing
logging statements?
RQ2. How well can we provide just-in-
time log change suggestions?
RQ3. What are the influential factors
that explain log changes?
Metrics are clustered into statistically
distinct groups based on their importance
27
1,000 random
forest
classifiers
Scott-Knott ClusteringBootstrapping
Statistically ranked
variable importance
Raw variable
importance
values
Clustering variables into
statistically distinct
groups
Metric Imp 1 Imp 2 …
X1 …
X2 …
... ... …
Rank
group
Metric
1 X1, X5, X10
2 X3, X6
... ...
Change measures and product measures
are the most influential factors for log
changes
28
Change metrics
Capture the code
changes in a
commit, e.g., the
number of changed
catch blocks
Historical metrics
Capture the code
changes in the
history, e.g., the
number of previous
log changes
Product metrics
Capture the
snapshot of the
source code, e.g.,
code complexity
metrics
Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
Making good logging decisions is critical for
failure diagnosis
Logs are usually the only resource
for debugging field failures
Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
Making good logging decisions is critical for
failure diagnosis
Logs are usually the only resource
for debugging field failures
30
We find 20 reasons for log changes across
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
§ Changing a try-catch block
§ Changing a conditional (if/ switch) branch
§ …
Percentage
Block
change
Log
improvement
Dependence-
driven change
Logging
issue
Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
Making good logging decisions is critical for
failure diagnosis
Logs are usually the only resource
for debugging field failures
32
We find 20 reasons for log changes across
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
§ Changing a try-catch block
§ Changing a conditional (if/ switch) branch
§ …
Percentage
Block
change
Log
improvement
Dependence-
driven change
Logging
issue
We use a random forest classifier
to model log changes
Random Forest
Classifier
Log change
suggestions
Three dimensions
25 metrics
Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
Making good logging decisions is critical for
failure diagnosis
Logs are usually the only resource
for debugging field failures
34
We find 20 reasons for log changes across
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
§ Changing a try-catch block
§ Changing a conditional (if/ switch) branch
§ …
Percentage
Block
change
Log
improvement
Dependence-
driven change
Logging
issue
We use a random forest classifier
to model log changes
Random Forest
Classifier
Log change
suggestions
Three dimensions
25 metrics
Metrics are clustered into statistically distinct
groups based on their importance
1,000 random
forest classifiers
Scott-Knott ClusteringBootstrapping
Statistically ranked
variable importance
Raw variable
importance values
Clustering variables into
statistically distinct groups
M etric Imp 1 Imp 2 …
X1 …
X2 …
... ... …
Ra nk
group
M etric
1 X1, X5, X10
2 X3, X6
... ...
Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
Making good logging decisions is critical for
failure diagnosis
Logs are usually the only resource
for debugging field failures
36
We find 20 reasons for log changes across
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
§ Changing a try-catch block
§ Changing a conditional (if/ switch) branch
§ …
Percentage
Block
change
Log
improvement
Dependence-
driven change
Logging
issue
We use a random forest classifier
to model log changes
Random Forest
Classifier
Log change
suggestions
Three dimensions
25 metrics
Metrics are clustered into statistically distinct
groups based on their importance
1,000 random
forest classifiers
Scott-Knott ClusteringBootstrapping
Statistically ranked
variable importance
Raw variable
importance values
Clustering variables into
statistically distinct groups
M etric Imp 1 Imp 2 …
X1 …
X2 …
... ... …
Ra nk
group
M etric
1 X1, X5, X10
2 X3, X6
... ...
Log analytics
tools
System
issues
Log
collection
Release
Make
logging
decisions
Produce at run-time
RegEx
[^…]
Logger.error()
Making good logging decisions is critical for
failure diagnosis
Logs are usually the only resource
for debugging field failures
37
We find 20 reasons for log changes across
four categories
68%
17% 10% 5%0%
10%
20%
30%
40%
50%
60%
70%
80%
§ Changing a try-catch block
§ Changing a conditional (if/ switch) branch
§ …
Percentage
Block
change
Log
improvement
Dependence-
driven change
Logging
issue
We use a random forest classifier
to model log changes
Random Forest
Classifier
Log change
suggestions
Three dimensions
25 metrics
Metrics are clustered into statistically distinct
groups based on their importance
1,000 random
forest classifiers
Scott-Knott ClusteringBootstrapping
Statistically ranked
variable importance
Raw variable
importance values
Clustering variables into
statistically distinct groups
M etric Imp 1 Imp 2 …
X1 …
X2 …
... ... …
Ra nk
group
M etric
1 X1, X5, X10
2 X3, X6
... ...
http://hengli.org
hengli@cs.queensu.ca

More Related Content

Similar to Towards Just-in-Time Suggestions for Log Changes

LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...
How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...
How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...Oleksandr Zaitsev
 
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug PredictionIt's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Predictionsjust
 
Partitioning composite code changes to facilitate code review
Partitioning composite code changes to facilitate code reviewPartitioning composite code changes to facilitate code review
Partitioning composite code changes to facilitate code reviewYida Tao
 
10135 a 11
10135 a 1110135 a 11
10135 a 11Bố Su
 
Industrial plant optimization in reduced dimensional spaces
Industrial plant optimization in reduced dimensional spacesIndustrial plant optimization in reduced dimensional spaces
Industrial plant optimization in reduced dimensional spacesCapstone
 
TrakSYS 10 | Performance Management Solution | MES
TrakSYS 10 | Performance Management Solution | MESTrakSYS 10 | Performance Management Solution | MES
TrakSYS 10 | Performance Management Solution | MESJason Corder
 
JEEConf 2016. Effectiveness and code optimization in Java applications
JEEConf 2016. Effectiveness and code optimization in  Java applicationsJEEConf 2016. Effectiveness and code optimization in  Java applications
JEEConf 2016. Effectiveness and code optimization in Java applicationsStrannik_2013
 
Log Engineering: Towards Systematic Log Mining to Support the Development of ...
Log Engineering: Towards Systematic Log Mining to Support the Development of ...Log Engineering: Towards Systematic Log Mining to Support the Development of ...
Log Engineering: Towards Systematic Log Mining to Support the Development of ...SAIL_QU
 
Log Engineering: Towards Systematic Log Mining to Support the Development of ...
Log Engineering: Towards Systematic Log Mining to Support the Development of ...Log Engineering: Towards Systematic Log Mining to Support the Development of ...
Log Engineering: Towards Systematic Log Mining to Support the Development of ...SAIL_QU
 
Summary Technical Presentation (General)
Summary Technical Presentation (General)Summary Technical Presentation (General)
Summary Technical Presentation (General)DonGlass
 
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)Bibhuti Prasad Nanda
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
 
Service Parts Logistics
Service Parts LogisticsService Parts Logistics
Service Parts Logisticsalvinjchua
 
Icse 2011 ds_1
Icse 2011 ds_1Icse 2011 ds_1
Icse 2011 ds_1SAIL_QU
 
BILS 2015 Christoph Herwig
BILS 2015 Christoph HerwigBILS 2015 Christoph Herwig
BILS 2015 Christoph HerwigGBX Events
 
Towards Software Sustainability Guides for Industrial Software Systems
Towards Software Sustainability Guides for Industrial Software SystemsTowards Software Sustainability Guides for Industrial Software Systems
Towards Software Sustainability Guides for Industrial Software SystemsHeiko Koziolek
 
System Update 2010 CrossRef Workshops Chuck Koscher
System Update 2010 CrossRef Workshops Chuck KoscherSystem Update 2010 CrossRef Workshops Chuck Koscher
System Update 2010 CrossRef Workshops Chuck KoscherCrossref
 

Similar to Towards Just-in-Time Suggestions for Log Changes (20)

LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...
How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...
How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...
 
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug PredictionIt's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
 
Partitioning composite code changes to facilitate code review
Partitioning composite code changes to facilitate code reviewPartitioning composite code changes to facilitate code review
Partitioning composite code changes to facilitate code review
 
10135 a 11
10135 a 1110135 a 11
10135 a 11
 
Industrial plant optimization in reduced dimensional spaces
Industrial plant optimization in reduced dimensional spacesIndustrial plant optimization in reduced dimensional spaces
Industrial plant optimization in reduced dimensional spaces
 
TrakSYS 10 | Performance Management Solution | MES
TrakSYS 10 | Performance Management Solution | MESTrakSYS 10 | Performance Management Solution | MES
TrakSYS 10 | Performance Management Solution | MES
 
JEEConf 2016. Effectiveness and code optimization in Java applications
JEEConf 2016. Effectiveness and code optimization in  Java applicationsJEEConf 2016. Effectiveness and code optimization in  Java applications
JEEConf 2016. Effectiveness and code optimization in Java applications
 
Log Engineering: Towards Systematic Log Mining to Support the Development of ...
Log Engineering: Towards Systematic Log Mining to Support the Development of ...Log Engineering: Towards Systematic Log Mining to Support the Development of ...
Log Engineering: Towards Systematic Log Mining to Support the Development of ...
 
Log Engineering: Towards Systematic Log Mining to Support the Development of ...
Log Engineering: Towards Systematic Log Mining to Support the Development of ...Log Engineering: Towards Systematic Log Mining to Support the Development of ...
Log Engineering: Towards Systematic Log Mining to Support the Development of ...
 
Summary Technical Presentation (General)
Summary Technical Presentation (General)Summary Technical Presentation (General)
Summary Technical Presentation (General)
 
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
 
Service Parts Logistics
Service Parts LogisticsService Parts Logistics
Service Parts Logistics
 
Icse 2011 ds_1
Icse 2011 ds_1Icse 2011 ds_1
Icse 2011 ds_1
 
Saner16a.ppt
Saner16a.pptSaner16a.ppt
Saner16a.ppt
 
Saner16a.ppt
Saner16a.pptSaner16a.ppt
Saner16a.ppt
 
BILS 2015 Christoph Herwig
BILS 2015 Christoph HerwigBILS 2015 Christoph Herwig
BILS 2015 Christoph Herwig
 
Towards Software Sustainability Guides for Industrial Software Systems
Towards Software Sustainability Guides for Industrial Software SystemsTowards Software Sustainability Guides for Industrial Software Systems
Towards Software Sustainability Guides for Industrial Software Systems
 
System Update 2010 CrossRef Workshops Chuck Koscher
System Update 2010 CrossRef Workshops Chuck KoscherSystem Update 2010 CrossRef Workshops Chuck Koscher
System Update 2010 CrossRef Workshops Chuck Koscher
 

More from SAIL_QU

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...SAIL_QU
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsSAIL_QU
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...SAIL_QU
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...SAIL_QU
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...SAIL_QU
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...SAIL_QU
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesSAIL_QU
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...SAIL_QU
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...SAIL_QU
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...SAIL_QU
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...SAIL_QU
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?SAIL_QU
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...SAIL_QU
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...SAIL_QU
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsSAIL_QU
 
On the Unreliability of Bug Severity Data
On the Unreliability of Bug Severity DataOn the Unreliability of Bug Severity Data
On the Unreliability of Bug Severity DataSAIL_QU
 
On the Link Between Mobile App Quality and User Reviews
On the Link Between Mobile App Quality and User ReviewsOn the Link Between Mobile App Quality and User Reviews
On the Link Between Mobile App Quality and User ReviewsSAIL_QU
 
Mining Software Engineering Data
Mining Software Engineering DataMining Software Engineering Data
Mining Software Engineering DataSAIL_QU
 

More from SAIL_QU (20)

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load tests
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
 
On the Unreliability of Bug Severity Data
On the Unreliability of Bug Severity DataOn the Unreliability of Bug Severity Data
On the Unreliability of Bug Severity Data
 
On the Link Between Mobile App Quality and User Reviews
On the Link Between Mobile App Quality and User ReviewsOn the Link Between Mobile App Quality and User Reviews
On the Link Between Mobile App Quality and User Reviews
 
Mining Software Engineering Data
Mining Software Engineering DataMining Software Engineering Data
Mining Software Engineering Data
 

Recently uploaded

A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1KnowledgeSeed
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAlluxio, Inc.
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILNatan Silnitsky
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...rajkumar669520
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareinfo611746
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...Alluxio, Inc.
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessWSO2
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Krakówbim.edu.pl
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfMeon Technology
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAlluxio, Inc.
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandIES VE
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
 

Recently uploaded (20)

A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 

Towards Just-in-Time Suggestions for Log Changes

  • 1. Towards Just-in-Time Suggestions for Log Changes Journal-first Presentation | Empirical Software Engineering Heng Li Weiyi Shang Ying Zou Ahmed E. Hassan
  • 3. 3 Log analytics tools System issues Log collection Release Make logging decisions Produce at run-time RegE x [^…] Logger.error() Logs are usually the only resource for debugging field failures
  • 4. 4 Log analytics tools System issues Log collection Release Make logging decisions Produce at run-time RegE x [^…] Logger.error() Making good logging decisions is critical for failure diagnosis Logs are usually the only resource for debugging field failures
  • 5. Missing an important log can significantly increase the difficulty of failure diagnosis 5 Missing an exception logging causes an error being swallowed
  • 6. Logging excessively is not an optimal solution 6 Large log files Hard to find real errors Hiding important information
  • 7. Failing to update logging statements when changing the code 7 Source code only checks one invalid item, while log says all invalid items are checked
  • 8. We want to provide logging suggestions when developers make code changes 8 Code changes Do we need to make log changes?
  • 9. Towards just-in-time suggestions for log changes 9 RQ1. What are the reasons for changing logging statements? RQ2. How well can we provide just-in- time log change suggestions? RQ3. What are the influential factors that explain log changes?
  • 10. Four case study systems 10 Over 2 M lines of code Over 13 K logging statements Over 40 K Commits 23% - 30% commits involve log changes
  • 11. Towards just-in-time suggestions for log changes 11 RQ1. What are the reasons for changing logging statements? RQ2. How well can we provide just-in- time log change suggestions? RQ3. What are the influential factors that explain log changes?
  • 12. We manually coded the reasons for log changes Random sampling of 380 log changes out of 32K log changes 12 Manual examined code changes, commit messages, and issue reports Open coding approach
  • 13. We find 20 reasons for log changes across four categories 68% 17% 10% 5%0% 10% 20% 30% 40% 50% 60% 70% 80% 13  Changing a try-catch block  Changing a conditional (if/switch) branch  … Percentage Block change Log improvemen t Dependence -driven change Logging issue
  • 14. We find 20 reasons for log changes across four categories 14  Improving debugging capability  Improving readability  … 68% 17% 10% 5%0% 10% 20% 30% 40% 50% 60% 70% 80% Percentage Block change Log improvemen t Dependence -driven change Logging issue
  • 15. We find 20 reasons for log changes across four categories 15  Logger change  Variable change  … 68% 17% 10% 5%0% 10% 20% 30% 40% 50% 60% 70% 80% Percentage Block change Log improvemen t Dependence -driven change Logging issue
  • 16. We find 20 reasons for log changes across four categories 16  Inappropriate log level  Inappropriate log text  … 68% 17% 10% 5%0% 10% 20% 30% 40% 50% 60% 70% 80% Percentage Block change Log improvemen t Dependence -driven change Logging issue
  • 17. Towards just-in-time suggestions for log changes 17 RQ1. What are the reasons for changing logging statements? RQ2. How well can we provide just-in- time log change suggestions? RQ3. What are the influential factors that explain log changes?
  • 18. We derive 25 software metrics that are related to log changes 18 Change metrics Capture the code changes in a commit, e.g., the number of changed catch blocks Historical metrics Capture the code changes in the history, e.g., the number of previous log changes Product metrics Capture the snapshot of the source code, e.g., code complexity metrics
  • 19. We derive 25 software metrics that are related to log changes 19 Change metrics Capture the code changes in a commit, e.g., the number of changed catch blocks Historical metrics Capture the code changes in the history, e.g., the number of previous log changes Product metrics Capture the snapshot of the source code, e.g., code complexity metrics
  • 20. We derive 25 software metrics that are related to log changes 20 Change metrics Capture the code changes in a commit, e.g., the number of changed catch blocks Historical metrics Capture the code changes in the history, e.g., the number of previous log changes Product metrics Capture the snapshot of the source code, e.g., code complexity metrics
  • 21. We use these software metrics as explanatory variables for log changes 21 Three dimensions 25 metrics
  • 22. We use a random forest classifier to model log changes 22 Random Forest Classifier Log change suggestion s Three dimensions 25 metrics
  • 23. We use a random forest classifier to model log changes 23 Random Forest Classifier Log change suggestion s Three dimensions 25 metrics Robust against overfitting (using bootstrap samples to construct decision trees)  Measure variable importance through variable permutation
  • 24. Our models can effectively suggest whether a log change is needed 24 0.84 0.91 0.86 0.88 0.4 0.5 0.6 0.7 0.8 0.9 1 Random guess AUC The performance (AUC) of a within-project evaluation
  • 25. Our models can effectively suggest whether a log change is needed 25 0.84 0.91 0.86 0.88 0.84 0.88 0.86 0.87 0.4 0.5 0.6 0.7 0.8 0.9 1 Within-project Cross-project AUC The performance (AUC) of a cross-project evaluation
  • 26. Towards just-in-time suggestions for log changes 26 RQ1. What are the reasons for changing logging statements? RQ2. How well can we provide just-in- time log change suggestions? RQ3. What are the influential factors that explain log changes?
  • 27. Metrics are clustered into statistically distinct groups based on their importance 27 1,000 random forest classifiers Scott-Knott ClusteringBootstrapping Statistically ranked variable importance Raw variable importance values Clustering variables into statistically distinct groups Metric Imp 1 Imp 2 … X1 … X2 … ... ... … Rank group Metric 1 X1, X5, X10 2 X3, X6 ... ...
  • 28. Change measures and product measures are the most influential factors for log changes 28 Change metrics Capture the code changes in a commit, e.g., the number of changed catch blocks Historical metrics Capture the code changes in the history, e.g., the number of previous log changes Product metrics Capture the snapshot of the source code, e.g., code complexity metrics
  • 29. Log analytics tools System issues Log collection Release Make logging decisions Produce at run-time RegEx [^…] Logger.error() Making good logging decisions is critical for failure diagnosis Logs are usually the only resource for debugging field failures
  • 30. Log analytics tools System issues Log collection Release Make logging decisions Produce at run-time RegEx [^…] Logger.error() Making good logging decisions is critical for failure diagnosis Logs are usually the only resource for debugging field failures 30
  • 31. We find 20 reasons for log changes across four categories 68% 17% 10% 5%0% 10% 20% 30% 40% 50% 60% 70% 80% § Changing a try-catch block § Changing a conditional (if/ switch) branch § … Percentage Block change Log improvement Dependence- driven change Logging issue
  • 32. Log analytics tools System issues Log collection Release Make logging decisions Produce at run-time RegEx [^…] Logger.error() Making good logging decisions is critical for failure diagnosis Logs are usually the only resource for debugging field failures 32 We find 20 reasons for log changes across four categories 68% 17% 10% 5%0% 10% 20% 30% 40% 50% 60% 70% 80% § Changing a try-catch block § Changing a conditional (if/ switch) branch § … Percentage Block change Log improvement Dependence- driven change Logging issue
  • 33. We use a random forest classifier to model log changes Random Forest Classifier Log change suggestions Three dimensions 25 metrics
  • 34. Log analytics tools System issues Log collection Release Make logging decisions Produce at run-time RegEx [^…] Logger.error() Making good logging decisions is critical for failure diagnosis Logs are usually the only resource for debugging field failures 34 We find 20 reasons for log changes across four categories 68% 17% 10% 5%0% 10% 20% 30% 40% 50% 60% 70% 80% § Changing a try-catch block § Changing a conditional (if/ switch) branch § … Percentage Block change Log improvement Dependence- driven change Logging issue We use a random forest classifier to model log changes Random Forest Classifier Log change suggestions Three dimensions 25 metrics
  • 35. Metrics are clustered into statistically distinct groups based on their importance 1,000 random forest classifiers Scott-Knott ClusteringBootstrapping Statistically ranked variable importance Raw variable importance values Clustering variables into statistically distinct groups M etric Imp 1 Imp 2 … X1 … X2 … ... ... … Ra nk group M etric 1 X1, X5, X10 2 X3, X6 ... ...
  • 36. Log analytics tools System issues Log collection Release Make logging decisions Produce at run-time RegEx [^…] Logger.error() Making good logging decisions is critical for failure diagnosis Logs are usually the only resource for debugging field failures 36 We find 20 reasons for log changes across four categories 68% 17% 10% 5%0% 10% 20% 30% 40% 50% 60% 70% 80% § Changing a try-catch block § Changing a conditional (if/ switch) branch § … Percentage Block change Log improvement Dependence- driven change Logging issue We use a random forest classifier to model log changes Random Forest Classifier Log change suggestions Three dimensions 25 metrics Metrics are clustered into statistically distinct groups based on their importance 1,000 random forest classifiers Scott-Knott ClusteringBootstrapping Statistically ranked variable importance Raw variable importance values Clustering variables into statistically distinct groups M etric Imp 1 Imp 2 … X1 … X2 … ... ... … Ra nk group M etric 1 X1, X5, X10 2 X3, X6 ... ...
  • 37. Log analytics tools System issues Log collection Release Make logging decisions Produce at run-time RegEx [^…] Logger.error() Making good logging decisions is critical for failure diagnosis Logs are usually the only resource for debugging field failures 37 We find 20 reasons for log changes across four categories 68% 17% 10% 5%0% 10% 20% 30% 40% 50% 60% 70% 80% § Changing a try-catch block § Changing a conditional (if/ switch) branch § … Percentage Block change Log improvement Dependence- driven change Logging issue We use a random forest classifier to model log changes Random Forest Classifier Log change suggestions Three dimensions 25 metrics Metrics are clustered into statistically distinct groups based on their importance 1,000 random forest classifiers Scott-Knott ClusteringBootstrapping Statistically ranked variable importance Raw variable importance values Clustering variables into statistically distinct groups M etric Imp 1 Imp 2 … X1 … X2 … ... ... … Ra nk group M etric 1 X1, X5, X10 2 X3, X6 ... ... http://hengli.org hengli@cs.queensu.ca