1. FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
Merging Event Logs in ProM
Jan Claes
Ghent University
http://processmining.ugent.be
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 6 February, 2012
2. Merging Event Logs
?
Multiple event logs ProM plugin Merged event log
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 2 / 21
3. Merging Event Logs
1. Find links 2. Merge chronologically 3. Add unlinked traces 4. Put in new log file
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 3 / 21
4. Approaches
Genetic Algorithm
J. Claes, G. Poels, Integrating Computer Log Files for Process Mining: a Genetic
Algorithm Inspired Technique, in CAiSE 2011 Workshops, LNBIP 83, 2011
Artificial Immune System
J. Claes, G. Poels, Merging Computer Log Files for Process Mining: an Artificial
Immune System Technique, in BPM 2011 Workshops, LNBIP 99, 2011
Rule Based
J. Claes, G. Poels, Merging Event Logs for Process Mining: A Rule Based Merging
Method and Rule Suggestion Algorithm, to be submitted in 2012
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 4 / 21
5. FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
1. Genetic Algorithm
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 6 February, 2012
6. 1. Genetic Algorithm
SEL cross-over
RAND fitness MUT
POP
POP mutation POP
Selection Reproduction
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 6 / 21
7. 1. Genetic Algorithm
Fitness function
Sum of weighted factor scores per link
• Same trace id (STIi)
• Trace order (TOi) if all start events are in the first log
• Equal attribute values (EAVi)
• Number of linked traces (NLTi)
• Time distance (TDi)
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 7 / 21
8. 1. Genetic Algorithm
Simplification
Population size one
Only mutations
Improvements
More intelligent start population (not random)
More intelligent mutations (improve at least one
factor of the fitness function)
Attention
Intensification vs. diversification
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 8 / 21
9. FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
2. Artificial Immune system
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 6 February, 2012
10. 2. Artificial Immune System
Immune cells
(type B-cell)
Antigen
Antibodies
(receptor)
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 10 / 21
11. 2. Artificial Immune System
HIGH HIGH HIGH
mutations
INIT sorted CLONE MUT EDIT
POP POP POP POP POP
RAND
POP LOW LOW
Affinity maturation
Initial population Clonal selection Hypermutation Receptor editing
SEED
LOW
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 11 / 21
12. 2. Artificial Immune System
Clonal selection
Clone the fittest x% solutions (I)
Hypermutation
Randomly change each clone
The higher the fitness score, the less changes (I)
Receptor editing
Take the best y% solutions (I)
Add totally random solutions to the set (D)
(I: Intensification, D: Diversification)
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 12 / 21
13. 2. Artificial Immune System
Hypermutation
Choose ‘random’ indicator factor to improve
• Higher chance to pick factors with positive previous effect
Choose random action
• Add link, remove link or alter link
Choose random candidate
• From all solutions that would improve with selected action
Choose random improvement
• From all possible improvements for selected candidate
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 13 / 21
14. FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
3. Rule Based
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 6 February, 2012
15. 3. Rule Based
Automatic merging is not transparant
(how good is the merging result?)
Previous algorithms are (too) slow
My experience
in most cases it is about finding an attribute value
(literally) in a trace of the other log
you need data experts/analyst to get the right
data, they mostly have a good idea about the link
between two log files
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 15 / 21
16. 3. Rule Based
Semi-automatic solution
Let user configure merging rule based on attribute
values
• More transparent
• Faster
• Includes expert knowledge if available
Help user by suggesting merging rules based on
the data in the log
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 16 / 21
17. 3. Rule Based
Merging rules
Merge all traces where…
attribute <select name> from <select container> in the 1st log
<select operator>
attribute <select name> from <select container> in the 2nd log
E.g. Merge all traces where attribute Trace ID from a trace in
the 1st log equals attribute Supplier Reference from event Send
goods in the 2nd log
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 17 / 21
18. 3. Rule Based
<select name>
• Contains all possible attribute names available in the log
<select container>
• From a trace
• From any event in a trace
• From a trace or any event in a trace
• From event X, From event Y, From event Z, …
<select operator>
• equals, is not equal, greater than, greater or equal, …
• comes before, comes after
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 18 / 21
19. 3. Rule Based
Suggesting rules
Look at all attribute values in the log
Make a rule for every equal match in both logs
Count the number of linked traces for every rule
Filter rules with only one link
Sort such that rule that is closer to 1-to-1 match is
higher in the list
• rules that make more or fewer links are lower in the list
• if no 1-to-1 rule exist, the ‘best’ rule is still on top
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 19 / 21
20. 3. Rule Based
Some remarks
User can configure rules or select from the
suggestion list
Suggestion list is currently limited to equals-rules
but is calculated very fast (order n1 + n2 !)
Rules can be combined with And or Or
By explicitly selecting rules, the approach is more
transparent
Possible use as shortcut for merging logs from
within one system
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 20 / 21
21. Contact information
Jan Claes
jan.claes@ugent.be
http://processmining.ugent.be
Twitter: @janclaesbelgium
Pav D8.a (until February 10)
Faculty of Economics and Business Administration Jan Claes for TUe 2012
Department of Management Information and Operations Management 21 / 21