1
Understanding Log Lines Using
Development Knowledge
Ahmed E. HassanMeiyappan NagappanWeiyi Shang Zhen Ming Jiang
Practitioners have challenges in understanding
log lines
2
Fetch failure
What exactly
does this
message mean?
What could
be the
cause?
Is it affecting
my data?
Practitioners either ask experts to help or
search online for log inquiries
3
We performed an exploratory study on 3 large
software systems
4
Zookeeper
5,641 logging
statements
1,080 logging
statements
1,163 logging
statements
We manually examined real-life inquiries about
log lines from 3 sources
5
User mailing lists
Randomly
sampled logs
5 types of information are inquired about logs
6
What exactly does this
message mean?
When does this occur?
What could be the cause?
How can I avoid this
message/problem?
Is it affecting my data?
Experts are crucial in resolving log inquiries
7
5
1
0
1
3
0
2
0 0 0
3
0 0 0 0
0
1
2
3
4
5
6
by expert by non-expert replied by
expert
only replied by
non-expert
not answered
resolved un-resolved
Hadoop Cassanddra Zookeeper
Experts are crucial in resolving log inquiries
8 out of 11 resolved
inquiries are resolved by
experts. 8
5
1
0
1
3
0
2
0 0 0
3
0 0 0 0
0
1
2
3
4
5
6
by expert by non-expert replied by
expert
only replied by
non-expert
not answered
resolved un-resolved
Hadoop Cassanddra Zookeeper
Experts are crucial in resolving log inquiries
9
5
1
0
1
3
0
2
0 0 0
3
0 0 0 0
0
1
2
3
4
5
6
by expert by non-expert replied by
expert
only replied by
non-expert
not answered
resolved un-resolved
Hadoop Cassanddra Zookeeper
Inquiries are always
resolved if experts reply.
Looking for an expert is not the optimal
approach to resolve log inquiries
10
Over 20% of the
inquires have no reply.
Wrong answers may be
posted in reply to
inquiries.
Identifying the expert
of a log line is
challenging.
First reply can take up
to 210 hours.
Can we document the inquired logs?
11
Nothing in common between inquired logs
12
An on-demand approach is needed to assist in
understanding logs.
Different log
verbosity levels
0 to 2 degrees of
fan-in
0 to 200 prior
code change
Real-life inquiries
13
We propose to attach development knowledge
to logs
Code
commit
Issue reports
Source code
/*
…
*/
Call graph
Code comments
14
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call graph
fetch failure
From method
checkAndInformJobTracker
of file ShuffleScheduler.java
An example of using development knowledge to
resolve inquiries of log “fetch failure”
15
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call graph
fetch failure
Notify the JobTracker after every read error, if
`reportReadErrorImmediately' is true or after
every `maxFetchFailuresBeforeReporting' failures
An example of using development knowledge to
resolve inquiries of log “fetch failure”
16
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call graph
fetch failure
Called by method
copyFailed in class ShuffleScheduler
An example of using development knowledge to
resolve inquiries of log “fetch failure”
17
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call graph
fetch failure
Allow shuffle retries and read-error
reporting to be configurable. Contributed
by Amareshwari Sriramadasu.
An example of using development knowledge to
resolve inquiries of log “fetch failure”
18
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call graph
fetch failure
MAPREDUCE-1171.
… This is caused by a behavioral change in
hadoop 0.20.1. …
…One solution I could see is "Provide a config
option... ”…
An example of using development knowledge to
resolve inquiries of log “fetch failure”
19
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call graph
fetch failure
Meaning: There is a data reading error.
Cause: One of the possible reasons is a configuration.
Context: The event happens during the shuffle period, while
copying data.
Impact: The event impacts the jobtracker.
Solution: Changing a configuration option would solve the issue.
Amareshwari Sriramadasu is the expert to go to.
An example of using development knowledge to
resolve inquiries of log “fetch failure”
20
Overview of our approach
Version
control
system
Generating
templates
for logs
Matching logs
with log
templates
Attaching development
knowledge to logs
Source
code
Log
templates
Development
knowledge
21
Step 1: Generating templates for logs
Version
control
system
foo() {
…
Log_statement(“time=%d,
Trying to launch, TaskID=%s”,
time, taskid);
…
}
time=d+, Trying to
launch, TaskID=S+
22
Step 2: Matching logs with log templates
time=d+, Trying to
launch, TaskID=S+
time=1, Trying to launch,
TaskID=task_1
time=2, launch task, TaskID=task_1
…
time=10, task finished, TaskID=task+1Log template
Logs
Step 3: Attaching development knowledge to
logs
23
Code
commit
Issue reports
Source code
/*
…
*/
Call graphCode comments
Version
control
system
Issue
tracking
system
Can development
knowledge complement
logging statements?
Complementing
logging statements
24
Resolving real-
life log inquiries
Can development knowledge
help resolve real-life
inquiries?
We compare our approach against Google and
mailing list for resolving real-life log inquiries
25
Real-life inquiries
0%
10%
20%
30%
40%
50%
60%
70%
80%
Percentage of resolved log inquiries
Our approach outperforms Google and is
comparable to mailing lists to resolve log
inquiries
26
0%
20%
40%
60%
80%
100%
Meaning Cause Context Solution Impact
Percentage of each type of inquired information provided by
our approach
Our approach provides 62% of inquired log
information
27
Complementing
logging statements
28
Resolving Log
Inquiries
Can Development Knowledge
Help Resolve Real-life
Inquiries?
YES!
Can development
knowledge complement
logging statements?
Complementing
logging statements
29
Resolving Log
Inquiries
Can Development Knowledge
Help Resolve Real-life
Inquiries?
YES!
Can development
knowledge complement
logging statements?
We complement a random sample of logging
statements using our approach
30Zookeeper
300 randomly
sampled logging
statements
Development knowledge can complement
logging statements
31
0
20
40
60
80
100
meaning cause context solution impact
Percentage of logging statements complemented by our
approach
Hadoop
Cassandra
Zookeeper
Issue reports are the best development knowledge
to complement logging statements.
Complementing
logging statements
32
Resolving Log
Inquiries
Can Development Knowledge
Help Resolve Real-life
Inquiries?
YES! YES!
Can development
knowledge complement
logging statements?
Practitioners have challenges in understanding
log lines
33
Fetch failure
What exactly
does this
message mean?
… could this
be the
cause?
Is it affecting
my data?
34
5 types of information are inquired about logs
35
What exactly does this
message mean?
When does this occur?
… could this be the cause?
It will be great if some one can
point to the direction how to
solve this?
Is it affecting my data?
36
37
We propose to attach development knowledge
to logs
Code
commit
Issue reports
Source code
/*
…
*/
Call graph
Code comments
38
Complementing
logging statements
39
Resolving Log
Inquiries
Can Development Knowledge
Help Resolve Real-life
Inquiries?
YES! YES!
Can development
knowledge complement
logging statements?
40
http://tinyurl.com/hirePhD

Understanding Log Lines using Development Knowledge

  • 1.
    1 Understanding Log LinesUsing Development Knowledge Ahmed E. HassanMeiyappan NagappanWeiyi Shang Zhen Ming Jiang
  • 2.
    Practitioners have challengesin understanding log lines 2 Fetch failure What exactly does this message mean? What could be the cause? Is it affecting my data?
  • 3.
    Practitioners either askexperts to help or search online for log inquiries 3
  • 4.
    We performed anexploratory study on 3 large software systems 4 Zookeeper 5,641 logging statements 1,080 logging statements 1,163 logging statements
  • 5.
    We manually examinedreal-life inquiries about log lines from 3 sources 5 User mailing lists Randomly sampled logs
  • 6.
    5 types ofinformation are inquired about logs 6 What exactly does this message mean? When does this occur? What could be the cause? How can I avoid this message/problem? Is it affecting my data?
  • 7.
    Experts are crucialin resolving log inquiries 7 5 1 0 1 3 0 2 0 0 0 3 0 0 0 0 0 1 2 3 4 5 6 by expert by non-expert replied by expert only replied by non-expert not answered resolved un-resolved Hadoop Cassanddra Zookeeper
  • 8.
    Experts are crucialin resolving log inquiries 8 out of 11 resolved inquiries are resolved by experts. 8 5 1 0 1 3 0 2 0 0 0 3 0 0 0 0 0 1 2 3 4 5 6 by expert by non-expert replied by expert only replied by non-expert not answered resolved un-resolved Hadoop Cassanddra Zookeeper
  • 9.
    Experts are crucialin resolving log inquiries 9 5 1 0 1 3 0 2 0 0 0 3 0 0 0 0 0 1 2 3 4 5 6 by expert by non-expert replied by expert only replied by non-expert not answered resolved un-resolved Hadoop Cassanddra Zookeeper Inquiries are always resolved if experts reply.
  • 10.
    Looking for anexpert is not the optimal approach to resolve log inquiries 10 Over 20% of the inquires have no reply. Wrong answers may be posted in reply to inquiries. Identifying the expert of a log line is challenging. First reply can take up to 210 hours.
  • 11.
    Can we documentthe inquired logs? 11
  • 12.
    Nothing in commonbetween inquired logs 12 An on-demand approach is needed to assist in understanding logs. Different log verbosity levels 0 to 2 degrees of fan-in 0 to 200 prior code change Real-life inquiries
  • 13.
    13 We propose toattach development knowledge to logs Code commit Issue reports Source code /* … */ Call graph Code comments
  • 14.
    14 Code commit Issue reports Source code /* … */ Code comments Callgraph fetch failure From method checkAndInformJobTracker of file ShuffleScheduler.java An example of using development knowledge to resolve inquiries of log “fetch failure”
  • 15.
    15 Code commit Issue reports Source code /* … */ Code comments Callgraph fetch failure Notify the JobTracker after every read error, if `reportReadErrorImmediately' is true or after every `maxFetchFailuresBeforeReporting' failures An example of using development knowledge to resolve inquiries of log “fetch failure”
  • 16.
    16 Code commit Issue reports Source code /* … */ Code comments Callgraph fetch failure Called by method copyFailed in class ShuffleScheduler An example of using development knowledge to resolve inquiries of log “fetch failure”
  • 17.
    17 Code commit Issue reports Source code /* … */ Code comments Callgraph fetch failure Allow shuffle retries and read-error reporting to be configurable. Contributed by Amareshwari Sriramadasu. An example of using development knowledge to resolve inquiries of log “fetch failure”
  • 18.
    18 Code commit Issue reports Source code /* … */ Code comments Callgraph fetch failure MAPREDUCE-1171. … This is caused by a behavioral change in hadoop 0.20.1. … …One solution I could see is "Provide a config option... ”… An example of using development knowledge to resolve inquiries of log “fetch failure”
  • 19.
    19 Code commit Issue reports Source code /* … */ Code comments Callgraph fetch failure Meaning: There is a data reading error. Cause: One of the possible reasons is a configuration. Context: The event happens during the shuffle period, while copying data. Impact: The event impacts the jobtracker. Solution: Changing a configuration option would solve the issue. Amareshwari Sriramadasu is the expert to go to. An example of using development knowledge to resolve inquiries of log “fetch failure”
  • 20.
    20 Overview of ourapproach Version control system Generating templates for logs Matching logs with log templates Attaching development knowledge to logs Source code Log templates Development knowledge
  • 21.
    21 Step 1: Generatingtemplates for logs Version control system foo() { … Log_statement(“time=%d, Trying to launch, TaskID=%s”, time, taskid); … } time=d+, Trying to launch, TaskID=S+
  • 22.
    22 Step 2: Matchinglogs with log templates time=d+, Trying to launch, TaskID=S+ time=1, Trying to launch, TaskID=task_1 time=2, launch task, TaskID=task_1 … time=10, task finished, TaskID=task+1Log template Logs
  • 23.
    Step 3: Attachingdevelopment knowledge to logs 23 Code commit Issue reports Source code /* … */ Call graphCode comments Version control system Issue tracking system
  • 24.
    Can development knowledge complement loggingstatements? Complementing logging statements 24 Resolving real- life log inquiries Can development knowledge help resolve real-life inquiries?
  • 25.
    We compare ourapproach against Google and mailing list for resolving real-life log inquiries 25 Real-life inquiries
  • 26.
    0% 10% 20% 30% 40% 50% 60% 70% 80% Percentage of resolvedlog inquiries Our approach outperforms Google and is comparable to mailing lists to resolve log inquiries 26
  • 27.
    0% 20% 40% 60% 80% 100% Meaning Cause ContextSolution Impact Percentage of each type of inquired information provided by our approach Our approach provides 62% of inquired log information 27
  • 28.
    Complementing logging statements 28 Resolving Log Inquiries CanDevelopment Knowledge Help Resolve Real-life Inquiries? YES! Can development knowledge complement logging statements?
  • 29.
    Complementing logging statements 29 Resolving Log Inquiries CanDevelopment Knowledge Help Resolve Real-life Inquiries? YES! Can development knowledge complement logging statements?
  • 30.
    We complement arandom sample of logging statements using our approach 30Zookeeper 300 randomly sampled logging statements
  • 31.
    Development knowledge cancomplement logging statements 31 0 20 40 60 80 100 meaning cause context solution impact Percentage of logging statements complemented by our approach Hadoop Cassandra Zookeeper Issue reports are the best development knowledge to complement logging statements.
  • 32.
    Complementing logging statements 32 Resolving Log Inquiries CanDevelopment Knowledge Help Resolve Real-life Inquiries? YES! YES! Can development knowledge complement logging statements?
  • 33.
    Practitioners have challengesin understanding log lines 33 Fetch failure What exactly does this message mean? … could this be the cause? Is it affecting my data?
  • 34.
  • 35.
    5 types ofinformation are inquired about logs 35 What exactly does this message mean? When does this occur? … could this be the cause? It will be great if some one can point to the direction how to solve this? Is it affecting my data?
  • 36.
  • 37.
    37 We propose toattach development knowledge to logs Code commit Issue reports Source code /* … */ Call graph Code comments
  • 38.
  • 39.
    Complementing logging statements 39 Resolving Log Inquiries CanDevelopment Knowledge Help Resolve Real-life Inquiries? YES! YES! Can development knowledge complement logging statements?
  • 40.

Editor's Notes

  • #2 Introduce my self and topic Title large
  • #3 Logs record important events of system, developers put logs there
  • #4 Logs record important events of system, developers put logs there
  • #5 Disconnect between dev and system admin
  • #6 Disconnect between dev and system admin
  • #7 12 out of 14 ask about cause 85% 6 out of 14 ask about solution 42%
  • #8 Emphysize on this slide more for take-home
  • #9 Emphysize on this slide more for take-home
  • #10 Emphysize on this slide more for take-home
  • #11 Disconnect between dev and system admin
  • #12 Disconnect between dev and system admin
  • #13 Disconnect between dev and system admin
  • #14 By knowing the logging method, I get the logging statements e.g, log4j
  • #15 neon
  • #16 neon
  • #17 neon
  • #18 neon
  • #19 neon
  • #20 neon
  • #21 By knowing the logging method, I get the logging statements e.g, log4j
  • #22 By knowing the logging method, I get the logging statements e.g, log4j
  • #23 By knowing the logging method, I get the logging statements e.g, log4j
  • #24 neon
  • #25 Mention base line approach
  • #26 Emphysize on this slide more for take-home
  • #27 Emphysize on this slide more for take-home
  • #28 Emphysize on this slide more for take-home
  • #29 Mention base line approach
  • #30 Mention base line approach
  • #31 Emphysize on this slide more for take-home
  • #32 Emphysize on this slide more for take-home
  • #33 Mention base line approach
  • #34 Logs record important events of system, developers put logs there
  • #36 12 out of 14 ask about cause 85% 6 out of 14 ask about solution 42%
  • #38 By knowing the logging method, I get the logging statements e.g, log4j
  • #40 Mention base line approach