6. Extracting Actual Usage from Log Files
• Model Extraction
– Typical Behavior and
Atypical Behavior
• Visualization
– Actual Log File
– Extracted Model
7. Log Data
• Huge Amount
• Different Purpose
• Informal Diagnostic
• Unstructured Syntax
8. Extracting Actual Usage from Log Files
Process
Mining
Process Model
Visualization
Actual Log
Visualization
Trace
PDEng Final
Project
• Extract
• Enrich
• Combine
• Transform
11. Requirements
• Functional:
– Extract
– Enrich
– Combine
– Transform
• Non-functional:
– Portability
• Architectural:
– Processes a Stream of Data
– Decomposes Tasks
– Decouples Tasks
– Defers Binding Time
Filter Filter Filter Filterpipepipepipe
12. Architecture Criteria
• Number of data sources?
• Complete or incomplete event instances?
• Complementary information available?
• System output
– ProM,
– Trace?
13. Prototype Architecture for ASML
out
«generic»
Log File B
out
out
«generic»
Log File C
out
out
«generic»
Log File A
out
in out
«generic»
Log B Event Parser
in out
in
out
«generic»
Log C Event Parser
in
out
in
out
«generic»
Log A Event Parser
in
out
«parameterized»
Regular Expression
Library
in out
«generic»
Item Parser
in out
«parameterized»
Lookup Table
in out
«domain-specific»
Log B Event Enricher
in out
in out
«domain-specific»
Log C Event Enricher
in out
in out
«domain-specific»
EventCombiner
in out in out
«generic»
MxmlSerializer
in out
in out
«generic»
Trace Transformer
in out
in
«generic»
Trace
in
in
out
«filter»
ProcessMiner
in
out«pipe»
«pipe»
«pipe»
«pipe»
«pipe»
«pipe»
«pipe»
«pipe»
«pipe»
«pipe»
«pipe»
17. Building a Bridge between Industry and Academia
Process Mining
Applied
Process Mining
Feedback
Logging
Infrastructure
Improvements
Customer
Profiles
Process Mining
Improvements
18. Summary
• Customer Profiles
automatically
extracted from
log files
• Prototype
• Portable Architecture
out
«generic»
Log File B
out
out
«generic»
Log File C
out
out
«generic»
Log File A
out
in out
«generic»
Log B Event Parser
in out
in
out
«generic»
Log C Event Parser
in
out
in
out
«generic»
Log A Event Parser
in
out
«parameterized»
Regular Expression
Library
in out
«generic»
Item Parser
in out
«parameterized»
Lookup Table
in out
«domain-specific»
Log B Event Enricher
in out
in out
«domain-specific»
Log C Event Enricher
in out
in out
«domain-specific»
EventCombiner
in out in out
«generic»
MxmlSerializer
in out
in out
«generic»
Trace Transformer
in out
in
«generic»
Trace
in
in
out
«filter»
ProcessMiner
in
out«pipe»
«pipe»
«pipe»
«pipe»
«pipe»
«pipe»
«pipe»
«pipe»
«pipe»
«pipe»
«pipe»
System in system context
They share common problem
How to get a grip at the usage context
Phillips lights become part of a security system, building management (automation) system
MRI Scanner is a node in a hospital management system and health monitoring system ( ultra sound, with xray)
Complex logistics system airport flight control system
Integration of systems
Minimizing the gap => maximize the chance of success
The problem that I had to solve
Problem for me
Problem for the customers
How to extract relevant information from huge amount of unstructured data
Design and implement an automated process that will use the log files as input and will provide proper output for ProM, Trace,…
Generic Component – no alternation is needed
Parameterized Component – alternation of the component parameter(s) is needed
Domain-specific Component – complete alternation of the component is needed
Fields that were not considered important for the academia, but are very relevant for the industry