By
Kashif Kashif
Kashif.namal@gmail.com
University of Camerino Italy
Process Mining
 A Process Managmenet technique that allows for the
analysis of business Process based on event logs.
 Algorithms are applied to event log datasets to find
patterns and details contained in event logs recorded
by an information system
 Objective is Effiecient and improve
Classification
 Discovery
A discovery technique takes an event log and
produces a process model without using any a-priori
information.
 Conformance checking
An existing process model is compared with an
event log of the same process.
 Enhancement
The main idea is to extend or improve an existing process
model using information about the actual process
recorded in some event log.
Approach Used
 Direct Algorithmic Approaches
 Two-Phase Approaches
 Computational Intelligence Approaches
 Partial Approaches
Direct Algorithmic Approaches
 Extracts footprint from the event log and uses this
footprint to directly construct a process model
 Also called language-based regions
 Extracted from the log and based on this relation a
Petri net is constructed
 Alpha Algorithem is Example of Direct Approach
 We apply an algorithm on the logs and derive directly
the process model
Two Phases Approach
 Uses a two-step approach in which first a “low-level model” (e.g., a
transition system , Markov model) is constructed.
 2nd step is that low-level model is converted into a “high-level model”
that can express concurrency and other (more advanced) control-flow
patterns.
 Transition system is extracted from the log using a customizable
abstraction mechanism.
 Transition system is converted into a Petri net using called statebased
regions The resulting model can be visualized as a Petri net, but can
also be converted into other notations (e.g., BPMN and EPCs).
 Similar approaches can be envisioned using hidden Markov models.
Using an Expectation-Maximization(EM) algorithm such as the Baum–
Welch algorithm, the “most likely” Markov model can be derived from
a log.
 Model is converted into highlevel model.
Hidden Morkov Model
 Set of states: {s1, s2, s3…. sn }
 Process moves from one state to another generating
 a sequence of states : s1, s2….
 Markov chain property: probability of each subsequent
state depends only on what was the previous state
Hidden Morkov Model
 You are going to find robot mood that either rebot is
happy or sad by watching movie(W), sleeping S,
Crying C, Facebook F.
 X=h if you happy X=s if unknown Y observation . w, s,
c or f .
 We want to answer queries, such as:
 P(X= h|Y =f) ?
 P(X= s|Y =c) ?
Hidden Morkov Model
Computational Intelligence
Approaches
 Techniques originating from the field of computational intelligence form the
basis for the third family of process discovery approaches.
 Examples of techniques are genetic programming, genetic algorithms,
simulated annealing, reinforcement learning, machine learning, neural
networks, fuzzy sets, rough sets, and swarm intelligence.
 The log is not directly converted into a model but uses an iterative procedure to
mimic the process of natural evaluation.
 Using genetic process mining approach starts with initial population of
individuals. Each individual corresponds to a randomly generated process
model. For each individual a fitness value is computed describing how well the
model fits with the log.
 Populations evolve by selecting the fittest individuals and generating new
individuals using genetic operators such as crossover (combining parts of two
individuals) and mutation (random modification of an individual). The fitness
gradually increases from generation to generation. The process stops once an
individual of acceptable quality is found.
Machine Learning
 Determine rules from data/facts
 Improve performance with experience
 Getting computers to program themselves
Sketch of an Induction Algorithm
 Calculate for each attribute,
 how good it classifies the elements of the training set
 Classify with the best attribute
 Repeat for each resutling subtree the first two steps
 Stop this recursive process as soon as a termination
condition is satisfied
Partial Approaches
 The approaches produce a complete end-to-end process
model.
 It is also possible to focus on rules or frequent patterns
approach for mining of sequential patterns.
 This approach is similar to the discovery of association
rules, however, now the order of events is taken into
account.
 Here a sliding window is used to analyze how frequent an
“episode” ( partial order) is appearing.
 Approaches exist to learn declarative (LTL-based)
languages like Declare.
PROLOG
 PROLOG (=PROgramming in LOGic) is a
programming language based on Horn clauses
 father(peter,mary).
 father(peter,john).
 mother(mary,mark).
 mother(jane,mary).
 grandfather(X,Z) :- father(X,Y), father(Y,Z).
grandfather(X,Z) :- father(X,Y), mother(Y,Z).
Heuristic miner
 Heuristics Miner is a practical applicable mining algorithm
that can deal with noise, and can be used to express the
main behavior that is not all details and exceptions,
registered in an event log.
 Extends alpha algorithm by considering the frequency of
traces in the log.
 The Heuristics Miner Plug-in mines the control flow
perspective of a process model.
 Considers the order of the events within a case.
 these algorithms take frequencies of events and sequences
into account when constructing a process model
Steps
 The construction of the dependency graph
 For each activity, the construction of the input and output
expressions
 The search for long distance dependency relations
 1. Read a log
 2. Get the set of tasks
 3. Infer the ordering relations based on their frequencies
 4. Build the net based on inferred relations
 5. Output the net
Genetic Miner
 Genetic miner uses a genetic algorithm to mine a petri
net representation of the process model from
execution traces.
 A global search strategy (the quality or fitness of a
candidate model is calculated by comparing the
process model with all traces in the event log the
search process takes place at a global level. For a local
strategy there is no guarantee that the outcome of the
locally optimal steps
Steps
 The first is to define the internal representation.
 The second concern is to define the fitness measure.
 The third concern relates to the genetic operators
(crossover and mutation)
 Read event log
 Build the initial population
 Calculate fitness of the individuals in the population
 Stop and return the fittest individuals
 Create next population
Fuzzy miner
 Process Mining is a technique for extracting process
models from execution logs.
 People have an idealized view of reality.
 Real-life processes turn out to be less structured than
people tend to believe.
 Model spaghetti-like
Output
 Phase I: Fuse similar behaving attributes
 Phase II: Generate Meta rules
 Phase III: Generate frequent fuzzy itemsets
 Phase IV: Make fuzzy association rules.
Questions

Process mining approaches kashif.namal@gmail.com

  • 1.
  • 2.
    Process Mining  AProcess Managmenet technique that allows for the analysis of business Process based on event logs.  Algorithms are applied to event log datasets to find patterns and details contained in event logs recorded by an information system  Objective is Effiecient and improve
  • 3.
    Classification  Discovery A discoverytechnique takes an event log and produces a process model without using any a-priori information.  Conformance checking An existing process model is compared with an event log of the same process.  Enhancement The main idea is to extend or improve an existing process model using information about the actual process recorded in some event log.
  • 4.
    Approach Used  DirectAlgorithmic Approaches  Two-Phase Approaches  Computational Intelligence Approaches  Partial Approaches
  • 5.
    Direct Algorithmic Approaches Extracts footprint from the event log and uses this footprint to directly construct a process model  Also called language-based regions  Extracted from the log and based on this relation a Petri net is constructed  Alpha Algorithem is Example of Direct Approach  We apply an algorithm on the logs and derive directly the process model
  • 6.
    Two Phases Approach Uses a two-step approach in which first a “low-level model” (e.g., a transition system , Markov model) is constructed.  2nd step is that low-level model is converted into a “high-level model” that can express concurrency and other (more advanced) control-flow patterns.  Transition system is extracted from the log using a customizable abstraction mechanism.  Transition system is converted into a Petri net using called statebased regions The resulting model can be visualized as a Petri net, but can also be converted into other notations (e.g., BPMN and EPCs).  Similar approaches can be envisioned using hidden Markov models. Using an Expectation-Maximization(EM) algorithm such as the Baum– Welch algorithm, the “most likely” Markov model can be derived from a log.  Model is converted into highlevel model.
  • 7.
    Hidden Morkov Model Set of states: {s1, s2, s3…. sn }  Process moves from one state to another generating  a sequence of states : s1, s2….  Markov chain property: probability of each subsequent state depends only on what was the previous state
  • 8.
    Hidden Morkov Model You are going to find robot mood that either rebot is happy or sad by watching movie(W), sleeping S, Crying C, Facebook F.  X=h if you happy X=s if unknown Y observation . w, s, c or f .  We want to answer queries, such as:  P(X= h|Y =f) ?  P(X= s|Y =c) ?
  • 9.
  • 10.
    Computational Intelligence Approaches  Techniquesoriginating from the field of computational intelligence form the basis for the third family of process discovery approaches.  Examples of techniques are genetic programming, genetic algorithms, simulated annealing, reinforcement learning, machine learning, neural networks, fuzzy sets, rough sets, and swarm intelligence.  The log is not directly converted into a model but uses an iterative procedure to mimic the process of natural evaluation.  Using genetic process mining approach starts with initial population of individuals. Each individual corresponds to a randomly generated process model. For each individual a fitness value is computed describing how well the model fits with the log.  Populations evolve by selecting the fittest individuals and generating new individuals using genetic operators such as crossover (combining parts of two individuals) and mutation (random modification of an individual). The fitness gradually increases from generation to generation. The process stops once an individual of acceptable quality is found.
  • 11.
    Machine Learning  Determinerules from data/facts  Improve performance with experience  Getting computers to program themselves
  • 12.
    Sketch of anInduction Algorithm  Calculate for each attribute,  how good it classifies the elements of the training set  Classify with the best attribute  Repeat for each resutling subtree the first two steps  Stop this recursive process as soon as a termination condition is satisfied
  • 13.
    Partial Approaches  Theapproaches produce a complete end-to-end process model.  It is also possible to focus on rules or frequent patterns approach for mining of sequential patterns.  This approach is similar to the discovery of association rules, however, now the order of events is taken into account.  Here a sliding window is used to analyze how frequent an “episode” ( partial order) is appearing.  Approaches exist to learn declarative (LTL-based) languages like Declare.
  • 14.
    PROLOG  PROLOG (=PROgrammingin LOGic) is a programming language based on Horn clauses  father(peter,mary).  father(peter,john).  mother(mary,mark).  mother(jane,mary).  grandfather(X,Z) :- father(X,Y), father(Y,Z). grandfather(X,Z) :- father(X,Y), mother(Y,Z).
  • 15.
    Heuristic miner  HeuristicsMiner is a practical applicable mining algorithm that can deal with noise, and can be used to express the main behavior that is not all details and exceptions, registered in an event log.  Extends alpha algorithm by considering the frequency of traces in the log.  The Heuristics Miner Plug-in mines the control flow perspective of a process model.  Considers the order of the events within a case.  these algorithms take frequencies of events and sequences into account when constructing a process model
  • 16.
    Steps  The constructionof the dependency graph  For each activity, the construction of the input and output expressions  The search for long distance dependency relations  1. Read a log  2. Get the set of tasks  3. Infer the ordering relations based on their frequencies  4. Build the net based on inferred relations  5. Output the net
  • 17.
    Genetic Miner  Geneticminer uses a genetic algorithm to mine a petri net representation of the process model from execution traces.  A global search strategy (the quality or fitness of a candidate model is calculated by comparing the process model with all traces in the event log the search process takes place at a global level. For a local strategy there is no guarantee that the outcome of the locally optimal steps
  • 18.
    Steps  The firstis to define the internal representation.  The second concern is to define the fitness measure.  The third concern relates to the genetic operators (crossover and mutation)  Read event log  Build the initial population  Calculate fitness of the individuals in the population  Stop and return the fittest individuals  Create next population
  • 20.
    Fuzzy miner  ProcessMining is a technique for extracting process models from execution logs.  People have an idealized view of reality.  Real-life processes turn out to be less structured than people tend to believe.  Model spaghetti-like
  • 21.
    Output  Phase I:Fuse similar behaving attributes  Phase II: Generate Meta rules  Phase III: Generate frequent fuzzy itemsets  Phase IV: Make fuzzy association rules.
  • 22.