This document describes a method for analyzing dependencies between Key Performance Indicators (KPIs) and lower-level metrics in business processes. It involves defining KPIs and metrics, monitoring process instances, and using classification algorithms like decision trees to learn relationships between metrics and KPI classes from historical data. The approach automates dependency analysis, is efficient compared to manual methods, and produces understandable decision tree models. Potential limitations include needing historical event logs to train models and ensuring all relevant data can be monitored.
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Analyzing Business Process Performance with KPI Dependency Trees
1. S-Cube Learning Package
Analyzing Business Process Performance
Using KPI Dependency Analysis
University of Stuttgart (USTUTT), TU Wien (TUW)
Branimir Wetzstein, USTUTT
www.s-cube-network.eu
2. Learning Package Categorization
S-Cube
Adaptable Coordinated
Service Compositions
Adaptable and QoS-aware
Service Compositions
Analyzing Business Process Performance
Using KPI Dependency Analysis
4. Let’s Consider a Scenario (1)
Assume we have implemented a business process as a
service orchestration
It is a reseller process which interacts with external services
of customer, suppliers, bank, shipper, and internal services
such as the warehouse etc.
5. Let’s Consider a Scenario (2)
We are interested in measuring the performance of the
business processes (time, cost, quality, customer satisfaction)
This is done by defining Key Performance Indicators (KPIs)
which specify target values on key metrics based on business
goals
– KPI target value function maps metric value ranges to KPI classes
(e.g., “good”, “medium”, “bad”)
Some typical KPI metrics in our scenario
– Order Fulfillment Lead Time
– Perfect Order Fulfillment (in time and in full)
– Customer Complaint Rate
– Availability of the reseller service
6. Let’s Consider a Scenario (3)
In the first step, KPIs are monitored at process runtime for a
set of process instances (what?)
If monitoring of KPIs shows unsatisfying results, we want to
be able to analyze and explain the violations (why?)
That is not trivial as a KPI often depends on many influential
factors measured by lower-level metrics
PPMs
Avail.in Stock,
Customer,
Products, …
Purchase Order Process
is measured by
Order
Fulfillment
Lead Time
QoS
Service
Availability,
Response Time
9. Architectural Overview (2)
Model and deploy the business process (e.g., in WS-BPEL )
Define and monitor a set of KPIs and potential influential
metrics
– Event-based monitoring based on CEP
– Supporting in particular both process events and QoS events and their
correlation
Train a decision tree (KPI Dependency Tree) from monitored
data
– Gather monitored data from Metrics DB
– Classify the monitored process instances according to their KPI class
– Use Decision Tree Learning Algorithms to learn the dependencies of
the KPI and the lower-level metrics
10. Background:
Event-Based Monitoring
In order to be used for analysis, runtime data needs to be
monitored
Event-based monitoring is an often-used idea to implement
this
Basic principle:
Register for and receive some lifecycle events from the service
composition and use Complex Event Processing (CEP) to extract,
correlate and aggregate monitoring data from raw event data
Can be used to monitor both QoS and domain-specific data
11. Background:
Monitoring of Service Orchestrations
Our monitoring approach for service orchestrations supports:
– Process Performance Metrics (PPMs) based on process events (BPEL event model)
– QoS metrics based on QoS events provided by QoS monitors
– Correlation of Process events and QoS events
– Metric calculation based on Complex Event Processing (ESPER)
Listener
Service Complex Event Dashboard
Event Processing
Metric
definitions
Process Engine
Event Metrics
Database
QoS Monitor
12. KPI Dependency Analysis - Motivation
So far we are able to monitor metrics and find out which KPI targets are
violated (what?)
In the next step, we want to explain the violations (why?)
That is not trivial as a KPI often depends on many influential factors
measured by lower-level metrics
Typically, such an analysis is done manually (if at all) by a business
analyst using OLAP queries on a data warehouse
that is very cumbersome and time-consuming
we want to “discover” the problems in an automated way
therefore we can use data mining techniques
In particular, we construct a classification problem and use existing
classification learning techniques (decision trees)
13. Background:
Machine Learning and Data Mining
Automated discovery of interesting patterns from large
amounts of data (stored typically in data warehouses)
– Manual discovery (e.g., by using OLAP queries) could take days or
weeks
Functionalities include:
– Mining of association rules, correlation analysis
– Classification and Prediction our focus here!
– Clustering
– Time-series analysis
– Graph mining and text mining
Interdisciplinary field using techniques from machine learning,
statistics, pattern recognition, data visualization
14. Background:
Classification Learning (1)
Given: a (historical) dataset containing a set of instances described in
terms of:
– A set of explanatory (a.k.a. predictive) categorical or numerical attributes
– a categorical target attribute (a.k.a. class)
Goal: based on the historical dataset (“supervised learning”) create a
classification model which helps…
– explaining the dependencies between the class and the explanatory attributes
in history data (interpretation)
– making predictions about future data; i.e. based on future explanatory attribute
values predict the class (prediction)
Some Classification Learning techniques:
– Decision Trees
– Classification Rules
– Support Vector Machines
15. Background:
Classification Learning (2)
Classification
Algorithm
Training Data Training Phase
Test Data Test Phase
Classification
New Data Model
Explanatory
Attribute
Values
Prediction Phase Interpretation Phase
Predicted Class Knowledge
16. Background:
Decision Tree Learning
Outgoing edges represent
A non-leaf node represents conditions on the parent
an explanatory categorical A1 explanatory attribute values
or numeric attribute;
<2 >4
2 < x <4
C4
A2 A2 80/2
yes no yes no
A leaf node represents a
C1 C1 target attribute class .
50 A3 30 A4
<3 >3 <1 >1
C2 C3 C1 C3 A path shows which
20/1 20 5 10/1
attribute values lead to a
certain class. The leaf node
shows the corresponding
number of instances from
the training set.
17. KPI Dependency Analysis
The KPI class of a process instance (alt. choreography instance, activity
instance, business object, …) depends on a set of influential factors
(PPMs and QoS metrics)
For finding out those dependencies, we use classification learning:
– The data set consists of a set of (historical) process instances; for each
process instance the KPI class and a set of metrics is evaluated
– The KPI is the target attribute which maps values of the underlying metric to
categorical values (KPI classes)
– The potential influential lower-level metrics are the explanatory attributes
(predictive variables)
– Goal: Based on a set of monitored instances, create a classification model
(decision tree) which identifies recurring relationships among the explanatory
attributes which describe the instances belonging to the same KPI class
The decision tree (KPI dependency tree) can be used to explain KPI
classes of past process instances and also to predict the class of process
instances for which only the values of some of the lower-level metrics are
are known
18. Defining KPIs
The KPI Definition includes:
– KPI metric definition (e.g., order fulfillment time)
– A set of categorical values defining the KPI classes, at least 2 (e.g.,
“green”, “yellow”, “red”)
– Target value function mapping KPI metric values to KPI classes (e.g.,
m < 2 days green, 2 days < m < 4 days yellow, otherwise red)
The KPI metric is specified for a monitored entity type:
– Process Instance (e.g., duration of a reseller process instance)
– Activity Instance (e.g., duration of the supplier service invocation)
– Choreography Instance (e.g., duration
– Service endpoint (e.g., availability)
– Set of Process Instances per day (e.g., average duration)
19. Generating Metric Definitions
A set of metric definitions (representing potential influential factors) can be
automatically generated
We support rules to generate automatically the following metrics:
– Service invocation:
- availability and response time of invoked service (both for synchronous
and asynchronous invocations (invoke-receive))
– WS-BPEL invoke activity (other basic activities not interesting for long-running
processes):
- execution time of the activity (i.e. the time between starting and finishing
the activity)
- If part of a loop, in addition:
- Average/minimum/maximum execution time per process instance
- number of executions per process instance
– For every branching activity, fault, compensation and event handlers, we
generate a metric representing the branch that has been executed
Metrics based on process variable data elements are created manually
20. Data Preparation and Learning
Create a KPI Analysis Model
– Select KPI + a set of potential influential factors
Gather metric values of monitored entity instances and create a training
set:
– Each monitored entity instance with ist KPI class and influential metric values
maps to a row in the training set
A decision tree is learned (e.g., using the J48 algorithm)
21. KPI Dependency Analysis
Monitor Model KPI Analysis Model
Choreo. Order Fulfillment Lead Time, Metric: Order Fulfillment Lead Time
Level: Delivery Time Shipment,…
Target < 5 days green
Value: >= 5days red
Orch. Order In Stock, Delivery Time Analyzed Time window: last 2 months
Level: Supplier, Order amount, Instances: Filter: customerType=“gold“
Designate a
Packaging time…
Metric as KPI Metric Set: M={orchestration.all, qos.all}
Service Process Infr. Av., response time
Level: banking service, … Algorithm: Classification Tree -J48
Design time
Metric
Runtime values
KPI Delivery Time
definition Shipment
KPI Supplier Order Process <2 >4
… 2 < x <4
Deliv. Time In Stock Availability
Order In Order In red
Red 28 h No 1,00 … Stock? Stock? 80/2
yes no yes no
Green N/A yes 0,84 …
Decision
green Delivery Time green Delivery Time
Red
32 h No 0,9
… Tree Supplier Supplier
50 30
Learning
<3 >3 <1 >1
……. ……. ……. ……. …
green red green red
20/1 20 5 10
22. Prototype Implementation
Prototype is based on…
Apache ODE (BPEL execution
engine)
– Publishes events to JMS topics
– Standalone QoS monitor evaluates
QoS metrics of services
Monitoring Tool
– Based on ESPER CEP Framework
– Metrics DB in MySQL
– Bam Dashboard as Java Swing
Application
Process Analyzer
– Uses WEKA Machine Learning
toolkit
24. Experimental Results
Generated tree for KPI = order fulfillment time (J48 algorithm)
Contains the expected influential metrics in a satisfactory manner and
produce suitable results ‘out of the box’
In our setting on a standard laptop computer a decision tree generation
based on 1000 instances takes about 30 sec
25. Experimental Results:
Drill-Down
Generated tree for KPI = “order in stock” (J48 algorithm)
Here, we perform “drill-down” analysis by setting the metric “order in stock”
as KPI
We want to understand which factors have an influence on whether the
order cannot be processed from stock
26. Experiment Results:
Differences between Algorithms
We have experimented with J48 and ADTree and generated trees for
different numbers of process instances (100, 400, 1000)
– ADTree algorithm produces bigger trees than J48 (third column: number of leaves and
nodes) for the same number of instances. However, it also reaches a higher precision
(last column: correctly classified instances).
– Both algorithms show very similar results concerning the displayed influential metrics.
Typically there is only one or at most two (marginal) metrics which differ.
27. Experiment Results:
Tree Size
Trees are getting bigger with the number of process instances
– J48 generated for 400 instances a tree with 11 nodes, for 1000
instances a tree with 18 nodes, while the precision improved only by
1%
– When the tree gets bigger, factors are shown in the tree which have
only marginal influence and thus make the tree less readable
(‘Displayed Metrics’ shows how many distinct metrics are displayed in
the tree)
28. Experiment Results:
Tree Size (2)
To improve the readability…
– usage of parameters has lead to only marginal changes in our
experiments (for example, J48 -U with no pruning). The only
parameter that turned out useful to reduce the size of the tree was
‘reduced error pruning’ (J48 -R)
– Another option, in the case of too many undesirable (marginal)
metrics, is to simply remove those metrics from the potential influential
factor metric set and repeat the analysis
29. Some Important Earlier Work
We were not the first ones to have similar ideas
Important earlier work includes:
Castellanos, M., et al., 2005. iBOM: a platform for intelligent business operation management. In: Proceedings
of the 21st international conference on data engineering (ICDE005). Washington, DC: IEEE Computer Society,
1084–1095
M. Castellanos, F. Casati, U. Dayal, and M.-C. Shan, “A Comprehensive and Automated Approach to
Intelligent Business Processes Execution Analysis,” Distributed and Parallel Databases, vol. 16, no. 3, pp.
239–273, 2004.
30. Main Advances Over Earlier Work
The S-Cube approach to KPI analysis based on event logs
improves on earlier work in some important aspects:
– KPI Dependency Analysis incorporates both process-level metrics and
QoS metrics
– Semi-automated generation of potential influential metric definitions for
WS-BPEL processes
– Many different algorithms can be used for analysis
- Courtesy of the WEKA backend
31. Discussion - Advantages
The KPI Dependency Analysis based on decision trees has a
number of clear advantages …
– Simplicity – the principle approach is relatively easy to understand; the
generated trees can be understood also by non-IT users
– Efficiency – the analysis of influential factors is “automated”; the
traditional approach is to manually pose analysis questions by using
OLAP queries over data marts which is much more time-consuming
– Proven in the real world – machine learning is by now a proven
technique that has been successfully applied in many areas
32. Discussion - Disadvantages
… but of course the approach also has some disadvantages.
– Bootstrapping problem – the approach assumes that some recorded
historical event logs are available for training
– Necessary domain knowledge – in order to define the potential
influential metric set some domain knowledge is necessary
– Availability of monitoring data – one of the basic assumptions of the
approach is that all necessary data can be monitored (if this is not the
case the approach cannot be used)
34. Summary
Classification learning based techniques can be used to
explain performance problems in service compositions
Steps:
1. Define a KPI and a set of potential influential metrics
2. Monitor all metrics for a set of process instances
3. Train a decision tree from historical event log
The created KPI dependency tree explains the dependencies
of the KPI classes and a set of lower level process metrics
and QoS metrics
35. Further S-Cube Reading
Wetzstein, Leitner, Rosenberg, Brandic, Dustdar, and Leymann. Monitoring and Analyzing Influential Factors
of Business Process Performance. In Proceedings of the 13th IEEE international conference on Enterprise
Distributed Object Computing (EDOC'09). IEEE Press, Piscataway, NJ, USA, 118-127.
Wetzstein, Branimir; Leitner, Philipp; Rosenberg, Florian; Dustdar, Schahram; Leymann, Frank: Identifying
Influential Factors of Business Process Performance Using Dependency Analysis. In: Enterprise Information
Systems. Vol. 5(1), Taylor & Francis, 2010.
Kazhamiakin, Raman; Wetzstein, Branimir; Karastoyanova, Dimka; Pistore, Marco; Leymann, Frank:
Adaptation of Service-Based Applications Based on Process Quality Factor Analysis. In: Proceedings of the
2nd Workshop on Monitoring, Adaptation and Beyond (MONA+), co-located with ICSOC/ServiceWave 2009.
Leitner, Wetzstein, Rosenberg, Michlmayr, Dustdar, and Leymann. Runtime Prediction of Service Level
Agreement Violations for Composite Services. In Proceedings of the 2009 International conference on
Service-Oriented Computing (ICSOC/ServiceWave'09), Springer-Verlag, Berlin, Heidelberg, 176-186.
Leitner, Michlmayr, Rosenberg, and Dustdar. Monitoring, Prediction and Prevention of SLA Violations in
Composite Services. In Proceedings of the 2010 IEEE International Conference on Web Services (ICWS '10).
IEEE Computer Society, Washington, DC, USA, 369-376.