Analyzing Business Process Performance with KPI Dependency Trees

S-Cube Learning Package

Analyzing Business Process Performance
Using KPI Dependency Analysis

University of Stuttgart (USTUTT), TU Wien (TUW)

Branimir Wetzstein, USTUTT

www.s-cube-network.eu

Learning Package Categorization

S-Cube

Adaptable Coordinated
Service Compositions

Adaptable and QoS-aware
Service Compositions

Analyzing Business Process Performance
Using KPI Dependency Analysis

Learning Package Overview

 Problem Description
 KPI Dependency Analysis
 Discussion
 Conclusions

Let’s Consider a Scenario (1)

 Assume we have implemented a business process as a
service orchestration
 It is a reseller process which interacts with external services
of customer, suppliers, bank, shipper, and internal services
such as the warehouse etc.


 We are interested in measuring the performance of the
business processes (time, cost, quality, customer satisfaction)
 This is done by defining Key Performance Indicators (KPIs)
which specify target values on key metrics based on business
goals
– KPI target value function maps metric value ranges to KPI classes
(e.g., “good”, “medium”, “bad”)

 Some typical KPI metrics in our scenario
– Order Fulfillment Lead Time
– Perfect Order Fulfillment (in time and in full)
– Customer Complaint Rate
– Availability of the reseller service


 In the first step, KPIs are monitored at process runtime for a
set of process instances (what?)
 If monitoring of KPIs shows unsatisfying results, we want to
be able to analyze and explain the violations (why?)
 That is not trivial as a KPI often depends on many influential
factors measured by lower-level metrics
PPMs

Avail.in Stock,
Customer,
Products, …
Purchase Order Process
is measured by
Order
Fulfillment
Lead Time
QoS

Service
Availability,
Response Time

Architectural Overview (2)

 Model and deploy the business process (e.g., in WS-BPEL )
 Define and monitor a set of KPIs and potential influential
metrics
– Event-based monitoring based on CEP
– Supporting in particular both process events and QoS events and their
correlation

 Train a decision tree (KPI Dependency Tree) from monitored
data
– Gather monitored data from Metrics DB
– Classify the monitored process instances according to their KPI class
– Use Decision Tree Learning Algorithms to learn the dependencies of
the KPI and the lower-level metrics

Background:
Event-Based Monitoring
 In order to be used for analysis, runtime data needs to be
monitored
 Event-based monitoring is an often-used idea to implement
this
 Basic principle:
Register for and receive some lifecycle events from the service
composition and use Complex Event Processing (CEP) to extract,
correlate and aggregate monitoring data from raw event data

 Can be used to monitor both QoS and domain-specific data

Background:
Monitoring of Service Orchestrations
 Our monitoring approach for service orchestrations supports:
– Process Performance Metrics (PPMs) based on process events (BPEL event model)
– QoS metrics based on QoS events provided by QoS monitors
– Correlation of Process events and QoS events
– Metric calculation based on Complex Event Processing (ESPER)

Listener

Service Complex Event Dashboard
Event Processing

Metric
definitions

Process Engine

Event Metrics
Database
QoS Monitor

KPI Dependency Analysis - Motivation

 So far we are able to monitor metrics and find out which KPI targets are
violated (what?)
 In the next step, we want to explain the violations (why?)
 That is not trivial as a KPI often depends on many influential factors
measured by lower-level metrics
 Typically, such an analysis is done manually (if at all) by a business
analyst using OLAP queries on a data warehouse
 that is very cumbersome and time-consuming
 we want to “discover” the problems in an automated way
therefore we can use data mining techniques
In particular, we construct a classification problem and use existing
classification learning techniques (decision trees)

Background:
Machine Learning and Data Mining
 Automated discovery of interesting patterns from large
amounts of data (stored typically in data warehouses)
– Manual discovery (e.g., by using OLAP queries) could take days or
weeks

 Functionalities include:
– Mining of association rules, correlation analysis
– Classification and Prediction  our focus here!
– Clustering
– Time-series analysis
– Graph mining and text mining

 Interdisciplinary field using techniques from machine learning,
statistics, pattern recognition, data visualization

Background:
Classification Learning (1)
 Given: a (historical) dataset containing a set of instances described in
terms of:
– A set of explanatory (a.k.a. predictive) categorical or numerical attributes
– a categorical target attribute (a.k.a. class)
 Goal: based on the historical dataset (“supervised learning”) create a
classification model which helps…
– explaining the dependencies between the class and the explanatory attributes
in history data (interpretation)
– making predictions about future data; i.e. based on future explanatory attribute
values predict the class (prediction)
 Some Classification Learning techniques:
– Decision Trees
– Classification Rules
– Support Vector Machines

Background:
Classification Learning (2)

Classification
Algorithm

Training Data Training Phase

Test Data Test Phase

Classification
New Data Model
Explanatory
Attribute
Values
Prediction Phase Interpretation Phase

Predicted Class Knowledge

Background:
Decision Tree Learning

Outgoing edges represent
A non-leaf node represents conditions on the parent
an explanatory categorical A1 explanatory attribute values
or numeric attribute;
<2 >4
2 < x <4

C4
A2 A2 80/2

yes no yes no

A leaf node represents a
C1 C1 target attribute class .
50 A3 30 A4

<3 >3 <1 >1

C2 C3 C1 C3 A path shows which
20/1 20 5 10/1
attribute values lead to a
certain class. The leaf node
shows the corresponding
number of instances from
the training set.

KPI Dependency Analysis

 The KPI class of a process instance (alt. choreography instance, activity
instance, business object, …) depends on a set of influential factors
(PPMs and QoS metrics)
 For finding out those dependencies, we use classification learning:
– The data set consists of a set of (historical) process instances; for each
process instance the KPI class and a set of metrics is evaluated
– The KPI is the target attribute which maps values of the underlying metric to
categorical values (KPI classes)
– The potential influential lower-level metrics are the explanatory attributes
(predictive variables)
– Goal: Based on a set of monitored instances, create a classification model
(decision tree) which identifies recurring relationships among the explanatory
attributes which describe the instances belonging to the same KPI class
 The decision tree (KPI dependency tree) can be used to explain KPI
classes of past process instances and also to predict the class of process
instances for which only the values of some of the lower-level metrics are
are known

Defining KPIs

 The KPI Definition includes:
– KPI metric definition (e.g., order fulfillment time)
– A set of categorical values defining the KPI classes, at least 2 (e.g.,
“green”, “yellow”, “red”)
– Target value function mapping KPI metric values to KPI classes (e.g.,
m < 2 days  green, 2 days < m < 4 days  yellow, otherwise red)

 The KPI metric is specified for a monitored entity type:
– Process Instance (e.g., duration of a reseller process instance)
– Activity Instance (e.g., duration of the supplier service invocation)
– Choreography Instance (e.g., duration
– Service endpoint (e.g., availability)
– Set of Process Instances per day (e.g., average duration)

Generating Metric Definitions

 A set of metric definitions (representing potential influential factors) can be
automatically generated
 We support rules to generate automatically the following metrics:
– Service invocation:
- availability and response time of invoked service (both for synchronous
and asynchronous invocations (invoke-receive))
– WS-BPEL invoke activity (other basic activities not interesting for long-running
processes):
- execution time of the activity (i.e. the time between starting and finishing
the activity)
- If part of a loop, in addition:
- Average/minimum/maximum execution time per process instance
- number of executions per process instance
– For every branching activity, fault, compensation and event handlers, we
generate a metric representing the branch that has been executed
 Metrics based on process variable data elements are created manually

Data Preparation and Learning

 Create a KPI Analysis Model
– Select KPI + a set of potential influential factors
 Gather metric values of monitored entity instances and create a training
set:
– Each monitored entity instance with ist KPI class and influential metric values
maps to a row in the training set
 A decision tree is learned (e.g., using the J48 algorithm)

KPI Dependency Analysis

Monitor Model KPI Analysis Model
Choreo. Order Fulfillment Lead Time, Metric: Order Fulfillment Lead Time
Level: Delivery Time Shipment,…
Target < 5 days  green
Value: >= 5days  red
Orch. Order In Stock, Delivery Time Analyzed Time window: last 2 months
Level: Supplier, Order amount, Instances: Filter: customerType=“gold“
Designate a
Packaging time…
Metric as KPI Metric Set: M={orchestration.all, qos.all}
Service Process Infr. Av., response time
Level: banking service, … Algorithm: Classification Tree -J48

Design time
Metric
Runtime values
KPI Delivery Time
definition Shipment

KPI Supplier Order Process <2 >4
… 2 < x <4
Deliv. Time In Stock Availability
Order In Order In red
Red 28 h No 1,00 … Stock? Stock? 80/2

yes no yes no
Green N/A yes 0,84 …
Decision
green Delivery Time green Delivery Time
Red
32 h No 0,9
… Tree Supplier Supplier
50 30
Learning
<3 >3 <1 >1
……. ……. ……. ……. …

green red green red
20/1 20 5 10

Prototype Implementation

 Prototype is based on…
 Apache ODE (BPEL execution
engine)
– Publishes events to JMS topics
– Standalone QoS monitor evaluates
QoS metrics of services

 Monitoring Tool
– Based on ESPER CEP Framework
– Metrics DB in MySQL
– Bam Dashboard as Java Swing
Application

 Process Analyzer
– Uses WEKA Machine Learning
toolkit

Learning Package Overview

 Problem Description
 KPI Dependency Analysis
 Discussion
 Conclusions

© Philipp Leitner

Experimental Results

 Generated tree for KPI = order fulfillment time (J48 algorithm)
 Contains the expected influential metrics in a satisfactory manner and
produce suitable results ‘out of the box’
 In our setting on a standard laptop computer a decision tree generation
based on 1000 instances takes about 30 sec

Experimental Results:
Drill-Down
 Generated tree for KPI = “order in stock” (J48 algorithm)
 Here, we perform “drill-down” analysis by setting the metric “order in stock”
as KPI
 We want to understand which factors have an influence on whether the
order cannot be processed from stock

Experiment Results:
Differences between Algorithms

 We have experimented with J48 and ADTree and generated trees for
different numbers of process instances (100, 400, 1000)
– ADTree algorithm produces bigger trees than J48 (third column: number of leaves and
nodes) for the same number of instances. However, it also reaches a higher precision
(last column: correctly classified instances).
– Both algorithms show very similar results concerning the displayed influential metrics.
Typically there is only one or at most two (marginal) metrics which differ.

Experiment Results:
Tree Size

 Trees are getting bigger with the number of process instances
– J48 generated for 400 instances a tree with 11 nodes, for 1000
instances a tree with 18 nodes, while the precision improved only by
1%
– When the tree gets bigger, factors are shown in the tree which have
only marginal influence and thus make the tree less readable
(‘Displayed Metrics’ shows how many distinct metrics are displayed in
the tree)

Experiment Results:
Tree Size (2)

 To improve the readability…
– usage of parameters has lead to only marginal changes in our
experiments (for example, J48 -U with no pruning). The only
parameter that turned out useful to reduce the size of the tree was
‘reduced error pruning’ (J48 -R)
– Another option, in the case of too many undesirable (marginal)
metrics, is to simply remove those metrics from the potential influential
factor metric set and repeat the analysis

Some Important Earlier Work

 We were not the first ones to have similar ideas
 Important earlier work includes:

Castellanos, M., et al., 2005. iBOM: a platform for intelligent business operation management. In: Proceedings
of the 21st international conference on data engineering (ICDE005). Washington, DC: IEEE Computer Society,
1084–1095

M. Castellanos, F. Casati, U. Dayal, and M.-C. Shan, “A Comprehensive and Automated Approach to
Intelligent Business Processes Execution Analysis,” Distributed and Parallel Databases, vol. 16, no. 3, pp.
239–273, 2004.

Main Advances Over Earlier Work

 The S-Cube approach to KPI analysis based on event logs
improves on earlier work in some important aspects:

– KPI Dependency Analysis incorporates both process-level metrics and
QoS metrics
– Semi-automated generation of potential influential metric definitions for
WS-BPEL processes
– Many different algorithms can be used for analysis
- Courtesy of the WEKA backend

Discussion - Advantages

 The KPI Dependency Analysis based on decision trees has a
number of clear advantages …

– Simplicity – the principle approach is relatively easy to understand; the
generated trees can be understood also by non-IT users
– Efficiency – the analysis of influential factors is “automated”; the
traditional approach is to manually pose analysis questions by using
OLAP queries over data marts which is much more time-consuming
– Proven in the real world – machine learning is by now a proven
technique that has been successfully applied in many areas

Discussion - Disadvantages

 … but of course the approach also has some disadvantages.

– Bootstrapping problem – the approach assumes that some recorded
historical event logs are available for training
– Necessary domain knowledge – in order to define the potential
influential metric set some domain knowledge is necessary
– Availability of monitoring data – one of the basic assumptions of the
approach is that all necessary data can be monitored (if this is not the
case the approach cannot be used)

Summary

 Classification learning based techniques can be used to
explain performance problems in service compositions
 Steps:
1. Define a KPI and a set of potential influential metrics
2. Monitor all metrics for a set of process instances
3. Train a decision tree from historical event log

 The created KPI dependency tree explains the dependencies
of the KPI classes and a set of lower level process metrics
and QoS metrics

Further S-Cube Reading

Wetzstein, Leitner, Rosenberg, Brandic, Dustdar, and Leymann. Monitoring and Analyzing Influential Factors
of Business Process Performance. In Proceedings of the 13th IEEE international conference on Enterprise
Distributed Object Computing (EDOC'09). IEEE Press, Piscataway, NJ, USA, 118-127.

Wetzstein, Branimir; Leitner, Philipp; Rosenberg, Florian; Dustdar, Schahram; Leymann, Frank: Identifying
Influential Factors of Business Process Performance Using Dependency Analysis. In: Enterprise Information
Systems. Vol. 5(1), Taylor & Francis, 2010.

Kazhamiakin, Raman; Wetzstein, Branimir; Karastoyanova, Dimka; Pistore, Marco; Leymann, Frank:
Adaptation of Service-Based Applications Based on Process Quality Factor Analysis. In: Proceedings of the
2nd Workshop on Monitoring, Adaptation and Beyond (MONA+), co-located with ICSOC/ServiceWave 2009.

Leitner, Wetzstein, Rosenberg, Michlmayr, Dustdar, and Leymann. Runtime Prediction of Service Level
Agreement Violations for Composite Services. In Proceedings of the 2009 International conference on
Service-Oriented Computing (ICSOC/ServiceWave'09), Springer-Verlag, Berlin, Heidelberg, 176-186.

Leitner, Michlmayr, Rosenberg, and Dustdar. Monitoring, Prediction and Prevention of SLA Violations in
Composite Services. In Proceedings of the 2010 IEEE International Conference on Web Services (ICWS '10).
IEEE Computer Society, Washington, DC, USA, 369-376.

Acknowledgements

The research leading to these results has
received funding from the European
Community’s Seventh Framework
Programme [FP7/2007-2013] under grant
agreement 215483 (S-Cube).

© Philipp Leitner

Analyzing Business Process Performance with KPI Dependency Trees

Recommended

Recommended

More Related Content

Similar to Analyzing Business Process Performance with KPI Dependency Trees

Similar to Analyzing Business Process Performance with KPI Dependency Trees (20)

More from virtual-campus

More from virtual-campus (20)

Recently uploaded

Recently uploaded (20)

Analyzing Business Process Performance with KPI Dependency Trees