Scalable Model Inference for Component-based System Logs

PRINS: Scalable Model Inference for
Component-based System Logs*
Donghwan Shin1), Domenico Bianculli2), and Lionel Briand2,3)
1) University of She
ffi
eld
2) University of Luxembourg
3) University of Ottawa
* This presentation is for the Journal-First Track at ICSE 2023; the original paper was accepted in Empirical Software Engineering (EMSE) journal.

A
B
Y
Z
…
Model
Inference
Technique
ith execution
20190621.001 A
20190621.002 B
20190621.002 Z
20190621.002 B
…
ith execution
20190621.001 A
20190621.002 B
20190621.002 Z
20190621.002 B
…
ith execution
20221101.001 A
20221101.004 B
20221101.011 Z
20221101.013 B
20221101.101 Y
…
System Logs System Model
Log = A sequence of log entries representing a single execution
fl
ow
Too large
Not Scalable
Enough
No Models
2

081111 090711 25010 INFO dfs.DataNode$DataXceiver: Receiving block blk_5652408071925555972 src: /10.251.65.203:38382 dest: /10.251.65.203:50010
081111 090711 00031 INFO dfs.FSNamesystem: BLOCK* NameSystem.allocateBlock: /user/root/rand8/_temporary/part-00156. blk_5652408071925555972
081111 090756 25011 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_5652408071925555972 terminating
081111 090756 25011 INFO dfs.DataNode$PacketResponder: Received block blk_5652408071925555972 of size 67108864 from /10.251.65.203
081111 090756 00027 INFO dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.251.71.16:50010 is added to blk_5652408071925555972
081111 111345 00013 INFO dfs.DataBlockScanner: Veri
fi
cation succeeded for blk_5652408071925555972
Example HDFS Log
Component IDs
3
Observation: Systems are often composed of multiple components

What if we infer INDIVIDUAL component
models and then stitch them together?
4

System Logs
eA
1
eA
2
eB
4
eB
4
eA
1
eA
2
eB
4
eA
1
eA
3
eB
5
eA
1
eA
2
eB
4
eB
4
eA
1
eA
2
eB
4
eA
1
eA
3
eB
5
ax
bx
dy
dy
ax
bx
dy
ax
cx
ey
PRINS: PRojection-INference-Stitching
s0
s1
s2
s3
s4
a b
c
d
e
INference
Model of x
Model of y
INference
Component x
Component y
PRojection
eA
1
eA
2
eA
1
eA
2
eA
1
eA
3
eA
1
eA
2
eA
1
eA
2
eA
1
eA
3
ax
bx
ax
bx
ax
cx
eB
4
eB
4
eB
4
eB
5
eB
4
eB
4
eB
4
eB
5
dy
dy
dy
ey
s0
s1
s2
a b
c
d
s4
e
Stitching
System Model
+ (optional) Heuristic
Determinisation (HD)

Research Questions
• RQ1: How does the execution time of PRINS change according to the parallel
inference tasks in the inference stage?
• RQ2: How does the execution time of change according to parameter ?
• RQ3: How does the accuracy of the models (in the form of gFSMs) generated
by change according to parameter ?
• RQ4: How fast is PRINS when compared to state-of-the-art model inference
techniques?
• RQ5: How accurate are the models generated by PRINS compared to those
generated by state-of-the-art model inference techniques?
HDu u
HDu u
6
Parallel
inference
Heuristic
Determinisation
PRINS
(compared
to
MINT)

Research Questions
• RQ1: How does the execution time of PRINS change according to the parallel
inference tasks in the inference stage?
• RQ2: How does the execution time of change according to parameter ?
• RQ3: How does the accuracy of the models (in the form of gFSMs) generated
by change according to parameter ?
• RQ4: How fast is PRINS when compared to state-of-the-art model inference
techniques?
• RQ5: How accurate are the models generated by PRINS compared to those
generated by state-of-the-art model inference techniques?
HDu u
HDu u
7
Parallel
inference
Heuristic
Determinisation
PRINS
(compared
to
MINT)

RQ4: Execution Time of PRINS compared to MINT
2 4 6 8
5
10
15
20
Execution
Time
(s)
Hadoop
MINT
PRINS-N
PRINS-P
2 4 6 8
0
5000
10000
HDFS
MINT
PRINS-N
PRINS-P
2 4 6 8
0
5000
10000
15000
Linux
MINT
PRINS-N
PRINS-P
2 4
0
2500
5000
7500
10000
Zookeeper
MINT
PRINS-N
PRINS-P
2 4 6 8
Duplication Factor
0
5000
10000
15000
Execution
Time
(s)
CoreSync
MINT
PRINS-N
PRINS-P
2 4 6 8
Duplication Factor
2.5
5.0
7.5
10.0
12.5
NGLClient
MINT
PRINS-N
PRINS-P
2 4 6 8
Duplication Factor
0
10000
20000
30000
Oobelib
MINT
PRINS-N
PRINS-P
2 4 6 8
Duplication Factor
0
5000
10000
15000
PDApp
MINT
PRINS-N
PRINS-P
PRINS-N = PRINS with No parallel inference (HD is enabled to be fair with MINT)
PRINS-P = PRINS with Parallel inference (HD is enabled to be fair with MINT)
Duplication Factor = How many times each log is duplicated to increase the input log size systematically
8

RQ5: Accuracy of PRINS compared to MINT
9

Downside: Size of System Models
10

Contributions
• Tame the scalability issue of model
inference using divide-and-conquer.
• Present an empirical evaluation of
PRINS and its comparison with the
state-of-the-art model inference tool.
• It works especially well when the
components appearing in di
ff
erent
executions are similar.
• Provide a publicly available
implementation of PRINS.
11
Paper (Open Access) Replication Package

Scalable Model Inference for Component-based System Logs

Recommended

Recommended

More Related Content

Similar to Scalable Model Inference for Component-based System Logs

Similar to Scalable Model Inference for Component-based System Logs (20)

More from Lionel Briand

More from Lionel Briand (20)

Recently uploaded

Recently uploaded (20)

Scalable Model Inference for Component-based System Logs