FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION




Merging Computer Log Files for Process Mining:
   An Artificial Immune System Technique
                                   Jan Claes and Geert Poels
                                http://processmining.ugent.be




Ghent University, Faculty of Economics and Business Administration                                 Jan Claes for EIS 2011
Department of Management Information and Operations Management                                         30 October, 2011
Process Mining

Processes are supported by IT systems
IT systems record actual process data
Process data can be used to
   Discover process model
   Check conformance with existing process info
   Improve or extend existing process model
Attention                       Process Mining
        Only As-Is
        Only (correctly) recorded information
Ghent University, Faculty of Economics and Business Administration   Jan Claes for EIS 2011
Department of Management Information and Operations Management        2 / 15
Process data in event logs




                                                                                       Event log
      The process




             Process support                                         Grouped events
                                              Recorded events
Ghent University, Faculty of Economics and Business Administration                    Jan Claes for EIS 2011
Department of Management Information and Operations Management                         3 / 15
Process Mining steps

 Preparation
             Collect data: find event information
             Merge data: from different sources
             Structure data: group per instance
             Convert data: to tool specific format
 Process mining
 Make decisions, take action
            Manual task                 Analysts needed in most cases
            Automated task Less human involvement needed
Ghent University, Faculty of Economics and Business Administration      Jan Claes for EIS 2011
Department of Management Information and Operations Management           4 / 15
Merging log files




                              My research:
                             Merging log files



Ghent University, Faculty of Economics and Business Administration   Jan Claes for EIS 2011
Department of Management Information and Operations Management        5 / 15
Merging log files




1. Find links between traces               2. Merge events chronologically   3. Add unlinked traces
Ghent University, Faculty of Economics and Business Administration                  Jan Claes for EIS 2011
Department of Management Information and Operations Management                       6 / 15
Find links

Required properties of solution
        Finds traces in both log files that belong to the
         same process execution
        Without prior knowledge about the provided log
         files (as generic as possible)
        But with maximal possibilities for the (expert) user
         to include his knowledge about the log files



Ghent University, Faculty of Economics and Business Administration   Jan Claes for EIS 2011
Department of Management Information and Operations Management        7 / 15
Find links

Proposed solution
       Take the best possible guess based on assumptions
       Include multiple indicator factors in analysis
       Calculate factor scores for each analysed solution
       Combine factor scores into global score per solution
       ‘Best guess’ is solution with highest combined score,
        because based on assumed indicators,
        most indicator value points to this solution
       Provide user interaction possibilities
Ghent University, Faculty of Economics and Business Administration   Jan Claes for EIS 2011
Department of Management Information and Operations Management        8 / 15
Decisions to make

Which indicator factors?
How to calculate a score for each factor?
How to combine factor scores to global score?
Which solutions to analyse?
 (analyse = calculate & compare scores)
Which user interactions to include (expert)
 user knowledge?
                                                                     See paper for more details
Ghent University, Faculty of Economics and Business Administration              Jan Claes for EIS 2011
Department of Management Information and Operations Management                   9 / 15
Indicator factors

Same trace identifier
        Assumption: If both logs contain a trace with the
         same id, there is a very high chance they match
        Not always though (e.g. customer id vs. order id)

                        16                                           10
                        17                                           12
                        18                                           14
                        19                                           16
                        20                                           18
                        21                                           20


Ghent University, Faculty of Economics and Business Administration        Jan Claes for EIS 2011
Department of Management Information and Operations Management            10 / 15
Indicator factors

Equal attribute values
        Assumption: The more attributes of a trace and its
         events from both logs are equal, the higher the
         chance they match

                        16     JAN 12:00                             17   JC 14 14:00
                        17     JAN 12:10                             18   JC 15 14:10
                        18     JAN 12:20                             19   JC 16 14:20
                        19     JAN 12:30                             1A   JC 17 14:30
                        20     JAN 12:40                             1B   JC 18 14:40
                        21     JAN 12:50                             1C   JC 19 14:50


Ghent University, Faculty of Economics and Business Administration                      Jan Claes for EIS 2011
Department of Management Information and Operations Management                          11 / 15
Test results

Simulated data (300-400 msec on standard laptop)
        Benefit of controllable parameters, known solution
        Correct number of linked traces in all tests
        Perfect results for same trace id and up to 50%
         noise, worse results for higher overlap of traces
Real data (6-10 min on standard laptop)
        Correct number of linked traces in all tests
        Almost perfect results for same trace id and up to
         50% noise, worse results for higher overlap
Ghent University, Faculty of Economics and Business Administration   Jan Claes for EIS 2011
Department of Management Information and Operations Management       12 / 15
New approach

Rule Based Merger
        User has to configure rules for linking traces
        Rule = relationship between attributes in both logs
        Events of linked traces are merged chronologically
“Merge all traces where
  attribute A of the trace in log 1 equals
  attribute B of any event in the trace in log 2”
Select attributes, contexts and operator
Research focus: suggesting merging rules
Ghent University, Faculty of Economics and Business Administration   Jan Claes for EIS 2011
Department of Management Information and Operations Management       13 / 15
New approach




Ghent University, Faculty of Economics and Business Administration   Jan Claes for EIS 2011
Department of Management Information and Operations Management       14 / 15
Contact information




                                                Jan Claes
                                                jan.claes@ugent.be

                                                http://processmining.ugent.be
                                                Twitter: @janclaesbelgium




Ghent University, Faculty of Economics and Business Administration              Jan Claes for EIS 2011
Department of Management Information and Operations Management                  15 / 15

EIS 2011

  • 1.
    FACULTY OF ECONOMICSAND BUSINESS ADMINISTRATION Merging Computer Log Files for Process Mining: An Artificial Immune System Technique Jan Claes and Geert Poels http://processmining.ugent.be Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011 Department of Management Information and Operations Management 30 October, 2011
  • 2.
    Process Mining Processes aresupported by IT systems IT systems record actual process data Process data can be used to  Discover process model  Check conformance with existing process info  Improve or extend existing process model Attention Process Mining  Only As-Is  Only (correctly) recorded information Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011 Department of Management Information and Operations Management 2 / 15
  • 3.
    Process data inevent logs Event log The process Process support Grouped events Recorded events Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011 Department of Management Information and Operations Management 3 / 15
  • 4.
    Process Mining steps Preparation  Collect data: find event information  Merge data: from different sources  Structure data: group per instance  Convert data: to tool specific format  Process mining  Make decisions, take action Manual task Analysts needed in most cases Automated task Less human involvement needed Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011 Department of Management Information and Operations Management 4 / 15
  • 5.
    Merging log files My research: Merging log files Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011 Department of Management Information and Operations Management 5 / 15
  • 6.
    Merging log files 1.Find links between traces 2. Merge events chronologically 3. Add unlinked traces Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011 Department of Management Information and Operations Management 6 / 15
  • 7.
    Find links Required propertiesof solution  Finds traces in both log files that belong to the same process execution  Without prior knowledge about the provided log files (as generic as possible)  But with maximal possibilities for the (expert) user to include his knowledge about the log files Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011 Department of Management Information and Operations Management 7 / 15
  • 8.
    Find links Proposed solution  Take the best possible guess based on assumptions  Include multiple indicator factors in analysis  Calculate factor scores for each analysed solution  Combine factor scores into global score per solution  ‘Best guess’ is solution with highest combined score, because based on assumed indicators, most indicator value points to this solution  Provide user interaction possibilities Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011 Department of Management Information and Operations Management 8 / 15
  • 9.
    Decisions to make Whichindicator factors? How to calculate a score for each factor? How to combine factor scores to global score? Which solutions to analyse? (analyse = calculate & compare scores) Which user interactions to include (expert) user knowledge? See paper for more details Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011 Department of Management Information and Operations Management 9 / 15
  • 10.
    Indicator factors Same traceidentifier  Assumption: If both logs contain a trace with the same id, there is a very high chance they match  Not always though (e.g. customer id vs. order id) 16 10 17 12 18 14 19 16 20 18 21 20 Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011 Department of Management Information and Operations Management 10 / 15
  • 11.
    Indicator factors Equal attributevalues  Assumption: The more attributes of a trace and its events from both logs are equal, the higher the chance they match 16 JAN 12:00 17 JC 14 14:00 17 JAN 12:10 18 JC 15 14:10 18 JAN 12:20 19 JC 16 14:20 19 JAN 12:30 1A JC 17 14:30 20 JAN 12:40 1B JC 18 14:40 21 JAN 12:50 1C JC 19 14:50 Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011 Department of Management Information and Operations Management 11 / 15
  • 12.
    Test results Simulated data(300-400 msec on standard laptop)  Benefit of controllable parameters, known solution  Correct number of linked traces in all tests  Perfect results for same trace id and up to 50% noise, worse results for higher overlap of traces Real data (6-10 min on standard laptop)  Correct number of linked traces in all tests  Almost perfect results for same trace id and up to 50% noise, worse results for higher overlap Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011 Department of Management Information and Operations Management 12 / 15
  • 13.
    New approach Rule BasedMerger  User has to configure rules for linking traces  Rule = relationship between attributes in both logs  Events of linked traces are merged chronologically “Merge all traces where attribute A of the trace in log 1 equals attribute B of any event in the trace in log 2” Select attributes, contexts and operator Research focus: suggesting merging rules Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011 Department of Management Information and Operations Management 13 / 15
  • 14.
    New approach Ghent University,Faculty of Economics and Business Administration Jan Claes for EIS 2011 Department of Management Information and Operations Management 14 / 15
  • 15.
    Contact information Jan Claes jan.claes@ugent.be http://processmining.ugent.be Twitter: @janclaesbelgium Ghent University, Faculty of Economics and Business Administration Jan Claes for EIS 2011 Department of Management Information and Operations Management 15 / 15