Predicting system failures can be of great benefit to managers that get a better command over system performance.
Data that systems generate in the form of logs is a valuable source of information to predict system reliability. As such, there is an increasing demand of
tools to mine logs and provide accurate predictions. However, interpreting information in logs poses some challenges. This talk
presents how to effectively mining sequences of logs and provide correct predictions.
The approach integrates different machine learning techniques to control for data brittleness, provide accuracy of model selection and validation,
and increase robustness of classification results. We apply the proposed approach to log sequences of 25 different applications of a software system for
telemetry of cars
Mining System Logs to Learn Error Predictors, Universität Stuttgart, Stuttgart June 2015
1. Mining System Logs to Learn Error
Predictors
A Case Study of a Telemetry System
Barbara Russo
L.E.S.E.R.
Faculty of Computer Science, Free University of Bozen-Bolzano, Italy
Barbara.Russo@unibz.it
Universität Stuttgart - June 9th, 2015
2. A collaboration between
Free University of Bozen-Bolzano, Italy
and
University of Alberta, Canada
Barbara Russo, Giancarlo Succi, Witold Pedrycz (2015) Mining system logs to learn
error predictors: a case study of a telemetry system, Empirical Software Engineering:
Volume 20, Issue 4 (2015), pp. 879-927
Universität Stuttgart - June 9th, 2015 2
3. System events
• Events describe the behaviour within and across
subsystems or components
– the system changes over time
• Logs track events
Universität Stuttgart - June 9th, 2015 3
4. The value of logs
• Log events carry information on
– the software application that generated the event and its
state,
– the task and the user whose interaction with the system
triggered the event, and
– the time-stamp at which the event is generated.
Universität Stuttgart - June 9th, 2015 4
5. Logs can be cryptic
Universität Stuttgart - June 9th, 2015 5
6. Errors
• Some behaviours are desirable and some are not
• Undesirable behaviours are referred to as system errors
– crashes that immediately stop the system and are easily
identifiable
– deviations from the expected output that let the system run
and reveal only at completion of system tasks
Universität Stuttgart - June 9th, 2015 6
7. Meaning of errors
• Events in error state (errors) act as alerts
– ? Manifestations of system failures
– ? Originated from a series of preceding events
– ? Immediate action must be taken
– ? Indication of an underlying problem
Universität Stuttgart - June 9th, 2015 7
8. Goal
• Analysing the behaviour of a (composite) system by
mining logs of events and predicting future system
misbehaviour
• Composite: many applications or subsystems
Universität Stuttgart - June 9th, 2015 8
9. Method
• Solve a classification problem with SVM
• Build a sequence abstraction by mining logs
• Integrate several statistical techniques to control for
data brittleness and accuracy of model selection and
validation
• Discuss the classification problem at different degree
of defectiveness
Universität Stuttgart - June 9th, 2015 9
10. Sequences
• A single event may not suffice to predict system
failures
• An event sequence is a set of events ordered by their
timestamp occurring within a given time window
• A sequence abstraction is a representation of identified
sequences in formal format that machines can read
Universität Stuttgart - June 9th, 2015 10
11. Research question
• Is the amount and type of information carried by a
sequence enough to predict errors?
Universität Stuttgart - June 9th, 2015 11
14. Example – sequence type
• sv1=[0,1,0,1]
• sv2=[2,1,1,0]
Universität Stuttgart - June 9th, 2015 14
15. Sequence type
• µi – number of the events of type i in a sequence
• sv=[µ1, …,µn] – vector of event multiplicities
• ρ(sv) = sum of # errors in sequences mapping into sv
Universität Stuttgart - June 9th, 2015 15
16. Features to feed SVM
• v= [sv, µ(sv), ν(sv)] – feature
– µ(sv) = # sequences mapping into sv
– ν(sv) average # of users in sequences mapping into sv
• v is an faulty feature if at least one event in one
sequence is in error state
Universität Stuttgart - June 9th, 2015 16
17. Sequence vector semantic
• Patterns of system behaviour
– If µ>1 and ρ>0 such sequences denote a reliability problem
that recurs
• Distributed teams
– If ν>1 the comparative analysis of features with ρ>0 or ρ=0
tells whether errors are originated by multi users working
for the same tasks
Universität Stuttgart - June 9th, 2015 17
18. Example - features
• v1= [0,1,0,1;1,1], sv1=[0,1,0,1]
– µ(sv1) =1, ν(sv1)=1, ρ(sv1)=0
• v2 = [2,1,1,0;1,2], sv2=[2,1,1,0]
– µ(sv2) =1, ν(sv2)=2, ρ(sv2)=2
Universität Stuttgart - June 9th, 2015 18
19. The classification problem
19
Data Sets Classifier
Different ex-ante
distributions:
(faulty, non-faulty)
G2 =Non-Faulty
G1= Faulty
Ex-post classification
differs on different
classifier’s thresholds
Features
20. Classification
• False Positive = features v that are predicted faulty
but do not contain error(s), ρ(sv)=0
• True positive = features v that are predicted faulty and
contain error(s), ρ(sv)>0
• False negative = features v that are predicted non-
faulty but that contain error(s), ρ(sv)>0
• True negative = features v that are predicted non-faulty
and do not contain error(s), ρ(sv)=0
Universität Stuttgart - June 9th, 2015 20
22. Build classifiers on historical data
22
Classifier
Training Set
Test
Set
1. To tune classifier’s parameters
2. To compute classifier’s fitting
performance
24. Validating sequence abstraction
• Did we put too much information in our features?
– Information Gain selects features that most contribute to the
information of a given classification category:
Classification category: sequences with a given number of error events
Universität Stuttgart - June 9th, 2015 24
25. Control the effect of the dataset
nature
• Does set balancing increase the quality of prediction?
– If classification categories are not equally represented in
datasets, classifiers might have low precision even though
true positive rate is high and false positive rate is low.
– Such imbalanced data sets are very frequent in software
engineering data
Universität Stuttgart - June 9th, 2015 25
26. Parametric classification
• The problem varies depending on how many errors we
allow in the system
• c – cut-off value, i.e., number of errors in a sequence
vector
• Categories:
– G1(c)={v = [sv, µ(sv),ν(sv)] | ρ(sv)≥c}
– G2(c)={v = [sv, µ(sv),ν(sv)] | ρ(sv)<c}
Universität Stuttgart - June 9th, 2015 26
28. Business Questions
• In our case study:
– Can we use Support Vector Machines to build suitable
predictors?
– Is there any Support Vector Machine that performs best for
all system applications?
– Is there any machine that does it for different levels of
reliability requested to the system?
Universität Stuttgart - June 9th, 2015 28
29. Descriptive analysis across apps
Universität Stuttgart - June 9th, 2015 29
54 datasets out of them
25 with some faulty
features
32. Splitting data
• Three approaches to control for artificial assumptions
– Varying the size of splitting “t-splitting”
– Reducing features with IG and varying size “t-splitting
reduced”
– Balancing sets “k-splitting” , i.e., manipulating sets so that
the number of instances in the two categories are balanced
Universität Stuttgart - June 9th, 2015 32
33. Types of SVM
• Different kernels
– Multilayer perceptron
– Linear
– Radial Basis Function
Universität Stuttgart - June 9th, 2015 33
34. Fitting performance ac. applications
Universität Stuttgart - June 9th, 2015 34
Number of applications for which a classifier
outperforms (with MR) the others in quality of fit
35. Prediction
Universität Stuttgart - June 9th, 2015 35
No filter
Filtered with IG
• Models with high fitting performance
(bal>0.73)
• Prediction performance averaged across t-
splitting and models
36. Findings
• Better with IG filtering, MP is best across applications,
but it is not the unique (Clustering applications?)
• Artificial balance does not help to identify a single
classifier, but it helps to increase convergence in those
classifiers that are not reduced with IG
Universität Stuttgart - June 9th, 2015 36
37. Findings (superior than literature)
• Best performance at individual application (MP, c=3):
– 1% false positive rate, 94% true positive rate, and 95%
precision
• Best performance across applications averaged over
models for c=2,
– 9% false positive rate, 78% true positive rate, and 95%
precision,
Universität Stuttgart - June 9th, 2015 37
38. What predictions can tell managers
• Application the manages software tools of cars
– Pervasive in the telemetry system
• 106 distinct features of 10 different event types, 18%
multiple sequences, and 89% with more than one user
• c=1
• IG reduction from 12 to 7 still including µ and ν
Universität Stuttgart - June 9th, 2015 38
40. Prediction - assumptions
• Behaviour is the same in next three months
• 1000 features
• Category balance is the one for the test set (fitting)
(39%)
– 390 faulty features and 610 non- faulty features
Universität Stuttgart - June 9th, 2015 40
41. In numbers
• We have 390 faulty features and 610 non-faulty
features and 450 predicted faulty features
• Predicted faulty features that have no error:
– 67 = 11%*610
• Fail to predict faulty features = 70 =18%*390
Universität Stuttgart - June 9th, 2015 41
Pred pos Pred neg Total
Pos 82% 18% 100%
Neg 11% 89% 100%
Total 45% 54% 100%
42. Cost of prediction
• Inspection cost. Wasting time ≥ 67 * average cost to
fix one error
– There might be more than one error in one sequence on
average
• Cost for undiscovered errors. Defect slippage ≥ 70
– Measure of system unreliability
– Cost to repair errors at late stages (inaccuracy, higher cost
due to pressure, not being able to fix)
Universität Stuttgart - June 9th, 2015 42
43. 0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
False Positive Rate
TruePositiveRate
Best prediction
models
Equal chance
Higher cost to fix
undiscovered errors
Higher inspection costs
Prediction
MP
RBF
L
Universität Stuttgart - June 9th, 2015 43
FPr=11%, TPr=82%
44. Recommendations
• Select models that first accurately fit historical data
before using them for predictions
– Best models for quality of fit are not always the best
predictors for all splitting sizes of a feature set
• Reduce information redundancy
Universität Stuttgart - June 9th, 2015 44
45. Recommendations
• Report fitting accuracy
• Use parametric classification
– The parameter being the number of errors a sequence must
contain in order to be classified as defective/faulty.
• Study prediction at different cut-off values or with different
splitting size or balance to solve the prediction problem
independently from the level of reliability requested for the
system and the nature of the data.
Universität Stuttgart - June 9th, 2015 45
47. With artificial balance
• It does not help to identify a single classifier
• It helps to increase convergence in those classifiers
that are not reduced with IG
47
48. With IG filter
48
Best classifiers across
different t-splitting;
classifiers with b<0.73 are
not reported