SlideShare a Scribd company logo
1 of 11
Download to read offline
PRINS: Scalable Model Inference for
Component-based System Logs*
Donghwan Shin1), Domenico Bianculli2), and Lionel Briand2,3)
1) University of She
ffi
eld
2) University of Luxembourg
3) University of Ottawa
* This presentation is for the Journal-First Track at ICSE 2023; the original paper was accepted in Empirical Software Engineering (EMSE) journal.
A
B
Y
Z
…
Model
Inference
Technique
ith execution
20190621.001 A
20190621.002 B
20190621.002 Z
20190621.002 B
…
ith execution
20190621.001 A
20190621.002 B
20190621.002 Z
20190621.002 B
…
ith execution
20221101.001 A
20221101.004 B
20221101.011 Z
20221101.013 B
20221101.101 Y
…
System Logs System Model
Log = A sequence of log entries representing a single execution
fl
ow
Too large
Not Scalable
Enough
No Models
2
081111 090711 25010 INFO dfs.DataNode$DataXceiver: Receiving block blk_5652408071925555972 src: /10.251.65.203:38382 dest: /10.251.65.203:50010
081111 090711 25181 INFO dfs.DataNode$DataXceiver: Receiving block blk_5652408071925555972 src: /10.251.27.63:54730 dest: /10.251.27.63:50010
081111 090711 25487 INFO dfs.DataNode$DataXceiver: Receiving block blk_5652408071925555972 src: /10.251.65.203:40305 dest: /10.251.65.203:50010
081111 090711 00031 INFO dfs.FSNamesystem: BLOCK* NameSystem.allocateBlock: /user/root/rand8/_temporary/part-00156. blk_5652408071925555972
081111 090756 25011 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_5652408071925555972 terminating
081111 090756 25011 INFO dfs.DataNode$PacketResponder: Received block blk_5652408071925555972 of size 67108864 from /10.251.65.203
081111 090756 25184 INFO dfs.DataNode$PacketResponder: PacketResponder 0 for block blk_5652408071925555972 terminating
081111 090756 25184 INFO dfs.DataNode$PacketResponder: Received block blk_5652408071925555972 of size 67108864 from /10.251.27.63
081111 090756 25488 INFO dfs.DataNode$PacketResponder: PacketResponder 1 for block blk_5652408071925555972 terminating
081111 090756 25488 INFO dfs.DataNode$PacketResponder: Received block blk_5652408071925555972 of size 67108864 from /10.251.65.203
081111 090756 00027 INFO dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.251.71.16:50010 is added to blk_5652408071925555972
081111 111345 00013 INFO dfs.DataBlockScanner: Veri
fi
cation succeeded for blk_5652408071925555972
Example HDFS Log
Component IDs
3
Observation: Systems are often composed of multiple components
What if we infer INDIVIDUAL component
models and then stitch them together?
4
System Logs
eA
1
eA
2
eB
4
eB
4
eA
1
eA
2
eB
4
eA
1
eA
3
eB
5
eA
1
eA
2
eB
4
eB
4
eA
1
eA
2
eB
4
eA
1
eA
3
eB
5
ax
bx
dy
dy
ax
bx
dy
ax
cx
ey
PRINS: PRojection-INference-Stitching
s0
s1
s2
s3
s4
a b
c
d
e
INference
Model of x
Model of y
INference
Component x
Component y
PRojection
eA
1
eA
2
eA
1
eA
2
eA
1
eA
3
eA
1
eA
2
eA
1
eA
2
eA
1
eA
3
ax
bx
ax
bx
ax
cx
eB
4
eB
4
eB
4
eB
5
eB
4
eB
4
eB
4
eB
5
dy
dy
dy
ey
s0
s1
s2
a b
c
d
s4
e
Stitching
System Model
+ (optional) Heuristic
Determinisation (HD)
Research Questions
• RQ1: How does the execution time of PRINS change according to the parallel
inference tasks in the inference stage?
• RQ2: How does the execution time of change according to parameter ?
• RQ3: How does the accuracy of the models (in the form of gFSMs) generated
by change according to parameter ?
• RQ4: How fast is PRINS when compared to state-of-the-art model inference
techniques?
• RQ5: How accurate are the models generated by PRINS compared to those
generated by state-of-the-art model inference techniques?
HDu u
HDu u
6
Parallel
inference
Heuristic
Determinisation
PRINS
(compared
to
MINT)
Research Questions
• RQ1: How does the execution time of PRINS change according to the parallel
inference tasks in the inference stage?
• RQ2: How does the execution time of change according to parameter ?
• RQ3: How does the accuracy of the models (in the form of gFSMs) generated
by change according to parameter ?
• RQ4: How fast is PRINS when compared to state-of-the-art model inference
techniques?
• RQ5: How accurate are the models generated by PRINS compared to those
generated by state-of-the-art model inference techniques?
HDu u
HDu u
7
Parallel
inference
Heuristic
Determinisation
PRINS
(compared
to
MINT)
RQ4: Execution Time of PRINS compared to MINT
2 4 6 8
5
10
15
20
Execution
Time
(s)
Hadoop
MINT
PRINS-N
PRINS-P
2 4 6 8
0
5000
10000
HDFS
MINT
PRINS-N
PRINS-P
2 4 6 8
0
5000
10000
15000
Linux
MINT
PRINS-N
PRINS-P
2 4
0
2500
5000
7500
10000
Zookeeper
MINT
PRINS-N
PRINS-P
2 4 6 8
Duplication Factor
0
5000
10000
15000
Execution
Time
(s)
CoreSync
MINT
PRINS-N
PRINS-P
2 4 6 8
Duplication Factor
2.5
5.0
7.5
10.0
12.5
NGLClient
MINT
PRINS-N
PRINS-P
2 4 6 8
Duplication Factor
0
10000
20000
30000
Oobelib
MINT
PRINS-N
PRINS-P
2 4 6 8
Duplication Factor
0
5000
10000
15000
PDApp
MINT
PRINS-N
PRINS-P
PRINS-N = PRINS with No parallel inference (HD is enabled to be fair with MINT)
PRINS-P = PRINS with Parallel inference (HD is enabled to be fair with MINT)
Duplication Factor = How many times each log is duplicated to increase the input log size systematically
8
RQ5: Accuracy of PRINS compared to MINT
9
Downside: Size of System Models
10
Contributions
• Tame the scalability issue of model
inference using divide-and-conquer.
• Present an empirical evaluation of
PRINS and its comparison with the
state-of-the-art model inference tool.
• It works especially well when the
components appearing in di
ff
erent
executions are similar.
• Provide a publicly available
implementation of PRINS.
11
Paper (Open Access) Replication Package

More Related Content

Similar to Scalable Model Inference for Component-based System Logs

Busy Polling: Past, Present, Future
Busy Polling: Past,      Present, FutureBusy Polling: Past,      Present, Future
Busy Polling: Past, Present, FutureVenkatPulimi
 
Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - L'IA pou...
Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - L'IA pou...Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - L'IA pou...
Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - L'IA pou...I MT
 
A GitOps model for High Availability and Disaster Recovery on EKS
A GitOps model for High Availability and Disaster Recovery on EKSA GitOps model for High Availability and Disaster Recovery on EKS
A GitOps model for High Availability and Disaster Recovery on EKSWeaveworks
 
Data acquisition and storage in Wireless Sensor Network
Data acquisition and storage in Wireless Sensor NetworkData acquisition and storage in Wireless Sensor Network
Data acquisition and storage in Wireless Sensor NetworkRutvik Pensionwar
 
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...Yoshitake Kobayashi
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
 
Simulation Management and Execution Control
Simulation Management and Execution ControlSimulation Management and Execution Control
Simulation Management and Execution ControlDaniel Wheeler
 
Addressing Network Operator Challenges in YANG push Data Mesh Integration
Addressing Network Operator Challenges in YANG push Data Mesh IntegrationAddressing Network Operator Challenges in YANG push Data Mesh Integration
Addressing Network Operator Challenges in YANG push Data Mesh IntegrationThomasGraf42
 
It5304 syllabus
It5304 syllabusIt5304 syllabus
It5304 syllabusnimal83
 
Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...
Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...
Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...ijesajournal
 
Support of Hostname and Sequencing in YANG Notifications
Support of Hostname and Sequencing in YANG NotificationsSupport of Hostname and Sequencing in YANG Notifications
Support of Hostname and Sequencing in YANG NotificationsThomasGraf42
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPFAlex Maestretti
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
 
Microservice - Up to 500k CCU
Microservice - Up to 500k CCUMicroservice - Up to 500k CCU
Microservice - Up to 500k CCUViet Tran
 
IEEE HPSR 2017 Keynote: Softwarized Dataplanes and the P^3 trade-offs: Progra...
IEEE HPSR 2017 Keynote: Softwarized Dataplanes and the P^3 trade-offs: Progra...IEEE HPSR 2017 Keynote: Softwarized Dataplanes and the P^3 trade-offs: Progra...
IEEE HPSR 2017 Keynote: Softwarized Dataplanes and the P^3 trade-offs: Progra...Christian Esteve Rothenberg
 

Similar to Scalable Model Inference for Component-based System Logs (20)

Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Busy Polling: Past, Present, Future
Busy Polling: Past,      Present, FutureBusy Polling: Past,      Present, Future
Busy Polling: Past, Present, Future
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
 
SDN and metrics from the SDOs
SDN and metrics from the SDOsSDN and metrics from the SDOs
SDN and metrics from the SDOs
 
slides
slidesslides
slides
 
Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - L'IA pou...
Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - L'IA pou...Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - L'IA pou...
Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - L'IA pou...
 
A GitOps model for High Availability and Disaster Recovery on EKS
A GitOps model for High Availability and Disaster Recovery on EKSA GitOps model for High Availability and Disaster Recovery on EKS
A GitOps model for High Availability and Disaster Recovery on EKS
 
Data acquisition and storage in Wireless Sensor Network
Data acquisition and storage in Wireless Sensor NetworkData acquisition and storage in Wireless Sensor Network
Data acquisition and storage in Wireless Sensor Network
 
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1
 
Simulation Management and Execution Control
Simulation Management and Execution ControlSimulation Management and Execution Control
Simulation Management and Execution Control
 
Addressing Network Operator Challenges in YANG push Data Mesh Integration
Addressing Network Operator Challenges in YANG push Data Mesh IntegrationAddressing Network Operator Challenges in YANG push Data Mesh Integration
Addressing Network Operator Challenges in YANG push Data Mesh Integration
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
 
It5304 syllabus
It5304 syllabusIt5304 syllabus
It5304 syllabus
 
Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...
Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...
Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...
 
Support of Hostname and Sequencing in YANG Notifications
Support of Hostname and Sequencing in YANG NotificationsSupport of Hostname and Sequencing in YANG Notifications
Support of Hostname and Sequencing in YANG Notifications
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflix
 
Microservice - Up to 500k CCU
Microservice - Up to 500k CCUMicroservice - Up to 500k CCU
Microservice - Up to 500k CCU
 
IEEE HPSR 2017 Keynote: Softwarized Dataplanes and the P^3 trade-offs: Progra...
IEEE HPSR 2017 Keynote: Softwarized Dataplanes and the P^3 trade-offs: Progra...IEEE HPSR 2017 Keynote: Softwarized Dataplanes and the P^3 trade-offs: Progra...
IEEE HPSR 2017 Keynote: Softwarized Dataplanes and the P^3 trade-offs: Progra...
 

More from Lionel Briand

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Metamorphic Testing for Web System Security
Metamorphic Testing for Web System SecurityMetamorphic Testing for Web System Security
Metamorphic Testing for Web System SecurityLionel Briand
 
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...Lionel Briand
 
Fuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation TestingFuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation TestingLionel Briand
 
Data-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical SystemsData-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical SystemsLionel Briand
 
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled SystemsMany-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled SystemsLionel Briand
 
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...Lionel Briand
 
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...Lionel Briand
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingLionel Briand
 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Lionel Briand
 
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyAutonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyLionel Briand
 
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Lionel Briand
 
Reinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case PrioritizationReinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case PrioritizationLionel Briand
 
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...Lionel Briand
 
On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...Lionel Briand
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Lionel Briand
 
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...Lionel Briand
 
A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...Lionel Briand
 
Requirements in Cyber-Physical Systems: Specifications and Applications
Requirements in Cyber-Physical Systems: Specifications and ApplicationsRequirements in Cyber-Physical Systems: Specifications and Applications
Requirements in Cyber-Physical Systems: Specifications and ApplicationsLionel Briand
 

More from Lionel Briand (20)

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Metamorphic Testing for Web System Security
Metamorphic Testing for Web System SecurityMetamorphic Testing for Web System Security
Metamorphic Testing for Web System Security
 
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
 
Fuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation TestingFuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation Testing
 
Data-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical SystemsData-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical Systems
 
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled SystemsMany-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
 
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
 
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
 
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyAutonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
 
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
 
Reinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case PrioritizationReinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case Prioritization
 
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
 
On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
 
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
 
A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...
 
Requirements in Cyber-Physical Systems: Specifications and Applications
Requirements in Cyber-Physical Systems: Specifications and ApplicationsRequirements in Cyber-Physical Systems: Specifications and Applications
Requirements in Cyber-Physical Systems: Specifications and Applications
 

Recently uploaded

What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutionsmonugehlot87
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 

Recently uploaded (20)

What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutions
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 

Scalable Model Inference for Component-based System Logs

  • 1. PRINS: Scalable Model Inference for Component-based System Logs* Donghwan Shin1), Domenico Bianculli2), and Lionel Briand2,3) 1) University of She ffi eld 2) University of Luxembourg 3) University of Ottawa * This presentation is for the Journal-First Track at ICSE 2023; the original paper was accepted in Empirical Software Engineering (EMSE) journal.
  • 2. A B Y Z … Model Inference Technique ith execution 20190621.001 A 20190621.002 B 20190621.002 Z 20190621.002 B … ith execution 20190621.001 A 20190621.002 B 20190621.002 Z 20190621.002 B … ith execution 20221101.001 A 20221101.004 B 20221101.011 Z 20221101.013 B 20221101.101 Y … System Logs System Model Log = A sequence of log entries representing a single execution fl ow Too large Not Scalable Enough No Models 2
  • 3. 081111 090711 25010 INFO dfs.DataNode$DataXceiver: Receiving block blk_5652408071925555972 src: /10.251.65.203:38382 dest: /10.251.65.203:50010 081111 090711 25181 INFO dfs.DataNode$DataXceiver: Receiving block blk_5652408071925555972 src: /10.251.27.63:54730 dest: /10.251.27.63:50010 081111 090711 25487 INFO dfs.DataNode$DataXceiver: Receiving block blk_5652408071925555972 src: /10.251.65.203:40305 dest: /10.251.65.203:50010 081111 090711 00031 INFO dfs.FSNamesystem: BLOCK* NameSystem.allocateBlock: /user/root/rand8/_temporary/part-00156. blk_5652408071925555972 081111 090756 25011 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_5652408071925555972 terminating 081111 090756 25011 INFO dfs.DataNode$PacketResponder: Received block blk_5652408071925555972 of size 67108864 from /10.251.65.203 081111 090756 25184 INFO dfs.DataNode$PacketResponder: PacketResponder 0 for block blk_5652408071925555972 terminating 081111 090756 25184 INFO dfs.DataNode$PacketResponder: Received block blk_5652408071925555972 of size 67108864 from /10.251.27.63 081111 090756 25488 INFO dfs.DataNode$PacketResponder: PacketResponder 1 for block blk_5652408071925555972 terminating 081111 090756 25488 INFO dfs.DataNode$PacketResponder: Received block blk_5652408071925555972 of size 67108864 from /10.251.65.203 081111 090756 00027 INFO dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.251.71.16:50010 is added to blk_5652408071925555972 081111 111345 00013 INFO dfs.DataBlockScanner: Veri fi cation succeeded for blk_5652408071925555972 Example HDFS Log Component IDs 3 Observation: Systems are often composed of multiple components
  • 4. What if we infer INDIVIDUAL component models and then stitch them together? 4
  • 5. System Logs eA 1 eA 2 eB 4 eB 4 eA 1 eA 2 eB 4 eA 1 eA 3 eB 5 eA 1 eA 2 eB 4 eB 4 eA 1 eA 2 eB 4 eA 1 eA 3 eB 5 ax bx dy dy ax bx dy ax cx ey PRINS: PRojection-INference-Stitching s0 s1 s2 s3 s4 a b c d e INference Model of x Model of y INference Component x Component y PRojection eA 1 eA 2 eA 1 eA 2 eA 1 eA 3 eA 1 eA 2 eA 1 eA 2 eA 1 eA 3 ax bx ax bx ax cx eB 4 eB 4 eB 4 eB 5 eB 4 eB 4 eB 4 eB 5 dy dy dy ey s0 s1 s2 a b c d s4 e Stitching System Model + (optional) Heuristic Determinisation (HD)
  • 6. Research Questions • RQ1: How does the execution time of PRINS change according to the parallel inference tasks in the inference stage? • RQ2: How does the execution time of change according to parameter ? • RQ3: How does the accuracy of the models (in the form of gFSMs) generated by change according to parameter ? • RQ4: How fast is PRINS when compared to state-of-the-art model inference techniques? • RQ5: How accurate are the models generated by PRINS compared to those generated by state-of-the-art model inference techniques? HDu u HDu u 6 Parallel inference Heuristic Determinisation PRINS (compared to MINT)
  • 7. Research Questions • RQ1: How does the execution time of PRINS change according to the parallel inference tasks in the inference stage? • RQ2: How does the execution time of change according to parameter ? • RQ3: How does the accuracy of the models (in the form of gFSMs) generated by change according to parameter ? • RQ4: How fast is PRINS when compared to state-of-the-art model inference techniques? • RQ5: How accurate are the models generated by PRINS compared to those generated by state-of-the-art model inference techniques? HDu u HDu u 7 Parallel inference Heuristic Determinisation PRINS (compared to MINT)
  • 8. RQ4: Execution Time of PRINS compared to MINT 2 4 6 8 5 10 15 20 Execution Time (s) Hadoop MINT PRINS-N PRINS-P 2 4 6 8 0 5000 10000 HDFS MINT PRINS-N PRINS-P 2 4 6 8 0 5000 10000 15000 Linux MINT PRINS-N PRINS-P 2 4 0 2500 5000 7500 10000 Zookeeper MINT PRINS-N PRINS-P 2 4 6 8 Duplication Factor 0 5000 10000 15000 Execution Time (s) CoreSync MINT PRINS-N PRINS-P 2 4 6 8 Duplication Factor 2.5 5.0 7.5 10.0 12.5 NGLClient MINT PRINS-N PRINS-P 2 4 6 8 Duplication Factor 0 10000 20000 30000 Oobelib MINT PRINS-N PRINS-P 2 4 6 8 Duplication Factor 0 5000 10000 15000 PDApp MINT PRINS-N PRINS-P PRINS-N = PRINS with No parallel inference (HD is enabled to be fair with MINT) PRINS-P = PRINS with Parallel inference (HD is enabled to be fair with MINT) Duplication Factor = How many times each log is duplicated to increase the input log size systematically 8
  • 9. RQ5: Accuracy of PRINS compared to MINT 9
  • 10. Downside: Size of System Models 10
  • 11. Contributions • Tame the scalability issue of model inference using divide-and-conquer. • Present an empirical evaluation of PRINS and its comparison with the state-of-the-art model inference tool. • It works especially well when the components appearing in di ff erent executions are similar. • Provide a publicly available implementation of PRINS. 11 Paper (Open Access) Replication Package