This document describes a performance comparison study of two BPMN 2.0 workflow management systems (WfMSs) across multiple minor versions. The study involved: 1) Testing the two WfMSs using different workload mixes derived from real business process models, 2) Measuring performance metrics like resource consumption, 3) Analyzing the evolution and performance of the two WfMSs over their last four minor versions from 2014-2016. The document also outlines a novel method for synthesizing representative workload mixes from a collection of business process models.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
BPMDS'17 - Performance Comparison Between BPMN 2.0 WfMS Versions
1. Performance
Comparison
Between
BPMN
2.0
Workflow
Management
Systems
Versions
Vincenzo
Ferme,
Ana
Ivanchikj,
Cesare
Pautasso
USI
Lugano,
Switzerland
Marigianna
Skouradaki,
Frank
Leymann
University
of
Stu8gart,
Germany
InsEtute
of
Architecture
of
ApplicaEon
Systems
15. Paper
ContribuEons
Insights
on
the
evoluEon
of
two
WfMSs
in
their
last
4
minor
versions
in
terms
of
performance
and
resource
consumpEon
15
A
novel
method
for
deriving
a
syntheEc
workload
mix
from
a
given
BP
models
collecEon
sequencePattern
Empty
Script 1
Empty
Script 2
parallelSplitPattern
Empty
Script 1
Empty
Script 2
workload_model_1
T1
T2
18. WfMSs
and
Versions
Vincenzo
Ferme
benchflow
[BPM
’15]
[ICPE
’16]
-‐ v 7.2.0 -‐ v 7.3.0
-‐ v 7.4.0 -‐ v 7.5.0
-‐ v 5.18.0 -‐ v 5.19.0.2
-‐ v 5.20.0 -‐ v 5.21.0
(2014-‐2016)
18
WfMS
A
-‐ official Docker image
-‐ default config.
WfMS
B
-‐ popular Docker image
-‐ vendor config.
70. TAKE
AWAY
Newer
version
70
Bener
performance
Differences
are
more
evident
with
more
users!
71. 71
Load
func8on?
Workload
mix?
Metrics?
Future
work
• domain
specific
models
• support
for
events
/
human
tasks
/
external
interacEon
• event
logs
for
realisEc
load
funcEons
• more
WfMSs
and
with
different
configuraEons
serngs
• maybe
new
metrics
of
interest
Good morning everyone, today I am going to present a paper written by myself, Ana Ivanchikj, with Vincenzo Ferme and Cesare Pautasso from the Faculty of Informatics at USI, Lugano Switzerland and Marigianna Skourdaki and Frank Leyman from the Institute of Architecture of Applcation Systems from Stuttgart, Germany. The two groups have been brought together by the Benchflow project and in this paper we compare the performance of different versions of BPMN 2.0 Workflow Management Systems.
Semantic versioning has definitely facilitated understanding of what type of changes a new software version brings. Is it a breaking change and thus we need to be cautious, is it a new cool feature
or is it a fix of that annoying bug we have been waiting for?
In the era of continuous improvements, these numbers are changing fast. How do we know whether it is time to upgrade? What do we need to consider?
So when 1.3.1 goes to 1.4.1 we know it is not a breaking change, and it brings a new functionality. If it is a cool functionality we are happy. But is that enough to make a decision? What about performance? What about Resource consumption? Have they changed in the new version? Do we care about it? Well if we are running our system on the cloud or for huge number of instances, than we should probably care about it and test it.
Before testing the performance a company has a lot of questions it needs to consider. First and foremost, should it test a new WfMS or just the existing one? Which versions should be included in the test? Than which business processes should be deployed in the WfMS? How many users do the company have or plans to have and how frequently do they start new process instances? And last but not least, which metrics are important for the company? Is it the throughput, is it the resource consumption?
All of these questions have to be answered based on company’s experiences and plans.
So once the company decides on the previously mentioned questions, it needs to run the tests and this is where Benchflow comes to play. Benchflow is an end-to-end framework for WfMSs’ performance testing relying on Faban and Docker . The framework handles the deployment of the WfMS (click) and the business processes to be used in tests (click). It allows to define the load function (click) and collects execution data to calculate metrics (click).
In this paper we:
help companies to derive a synthetic workload mix from their BP models collection and
Provide insights on the evolution of two WfMSs in their last 4 minor versions in terms of performance and resource consumption
Now we are going to see one by one our decisions about all the questions which needed to be answered before starting the performance testing.
So foremost, lets look at the WfMSs we have decided to test.
We selected two well known opens source management systems, the names of which we cannot publish due to lack of explicit consent from the vendors. We deployed them on Benchflow using the official Docker image and the default configuration for WfMS A and a popular Docker image and a recommended vendor configuration for WfMS B.
We cover two years of development (2014-2016), i.e., versions 7.2.0 through 7.5.0 for WfMS A and versions 5.18.0 through 5.21.0 for WfMS B.
Now we are going to propose a method for deriving a workload mix and apply it.
Instead of using arbitrary BP models in performance tests, companies may generate synthetic BP models that reflect the essence of their BP models collection and here is one possible method to do the same. It takes as an input a collection of business processes and consists of four steps, analyze, extract reoccurring structures, synthesize representative BPs and define the workload mix. We will now go into details for all of them.
So once we have company’s BP collection (click)
In the first phase in the method we analyze the collection using statistical metrics (click) about the size and the structure of the business processes in the collection, and we cluster (click) the processes based on such static metrics.
In the second phase we extract reoccurring structures using the ROSE algorithm. The detected patterns are extracted and annotated (click) with their frequency of appearance in the original collection, as well as other metadata regarding their structure.
For synthesizing a representative BPs (click) in Phase 3 we use the clusters identified in phase I to determine the size and the control flow characteristics of the process to be synthesized, and than we use the the extracted structures (click) and their metadata to synthesize the process.
Finally, in Phase 4 for each of the models synthesized in phase 3 we define the intensity, i.e., the % of instances started from the given model in the whole number of instances started. The intensity is based on the frequency of occurrence in the collection of the segments which comprise the business process model. We have called the combination of the BP model and its intensity a workload class. All classes together comprise the workload mix, which when generated
with this methodology is representative of the given collection.
To create a case study scenario, we use a collection which initially counted app. 14,000 models but included also invalid and incomplete processes. As most of the models in the collection were in bpmn, we decided to focus on them with models from IBM, the BPM Academic initiative and the BPMN standard itself. After cleaning and transforming the processes we ended up with 3,247 models on which we applied the workload mix generation method.
As can be seen from the histogram (click), in the collection, the BP size ranges from 3 to 120 nodes. Models (click) with size between 5 and 32 nodes represent 82% of the collection. So we set the minimum of nodes in an identified reoccurring pattern to 5.
The performed clustering analysis resulted in six clusters of gradual complexity. The first four clusters represent 94% of the collection. Therefore, we consider the first four clusters to be the most representative of the collection’s structure. They are shown in the table. And we can see that they gradually become larger and more complex.
To extract the reoccurring structures we have used the ROSE Algorithm, the work on this algorithm has been presented at ICWS and OTM last year. The analysis of the models resulted with a set of 143 reoccurring structures with size over 5 nodes.
Based on the clusters we decided on the structural attributes a BP in a class should have, and than used the detected recurring structural patterns to synthesize the representative BPs. (Click) The first three processes are more simple built by just one reoccurring structure but with the greatest intensities as they are made of a structure which appears frequently in the collection.
The other two BPs are larger and more complex made out of 2 structural patterns but with smaller intensity.
Ok so we have defined a workload mix that is representative of the company. Now lets give a look at the load function.
For running the performance of experiments on our generated workload mix, we have decided to use 50, 500 and 1000 simulated users in three different experiments to reflect different-size companies. We used a think time of 1 sec between instances, a ramp-up period (30 sec) during which the number of instantiated BP instances is gradually increased, followed by a steady state (10 min), where the number of instantiated BP instances remains stable, and a ramp-down period (30 sec), during which the number of instantiated BP instances is gradually decreased.
So the last thing we need to decide on before actually running the tests are the metrics.
We calculate three sets of metrics: client side and server side performance metrics and resource consumption metrics.
On the client side we have the number of requests per second, on the server side the BP instance duration and throughput, and in the resource consumption metrics we look at both the CPU and RAM. I will explain them in more details on dedicated slides.
So now we are ready to execute the performance tests and analyze the results. We executed each experiment three times to ensure the reliability of data.
And here are the results. You can’t read them can you? Well it is ok, you can read them in the paper, now we are going to visualize them.
For the client side performance we look at the number of requests per second issued by the simulated users. The labels show the 95% confidence interval between the three trials.
This metric is relatively stable between all versions of both WfMSs when tested with 50 and 500 users. However, there is a more substantial decrease when tested with 1’000 users, especially in newer versions of the WfMS A. The actual value depends mainly on the WfMS’s response time.
If one expects performance improvements with new system releases, these results could be surprising given that, with respect to these metrics, older versions show better performance than newer versions.
These metrics are computed based on the server-side performance data logged by the WfMSs. The duration is defined as the time interval between the start and the completion of a BP instance. We report the weighted average aggregated among all the executed BP instances where we compute the weights based on the number of executed BP instances in each trial. As throughput we define the number of executed BP instances per second.
The performance decrease with newer versions is made even more evident by the BP instance duration metric. The average duration of a BP instance increases as new versions are introduced for both WfMSs, regardless of the number of users (click).
In the last two versions of each system the average instance duration decreases as the number of users increases (click).
This decrease is especially noticeable for WfMS B when going from 50 to 500 users (click) where it goes from round 7 milliseconds to around 5.7 milliseconds, and less when going from 500 to 1’000 users (click) when it goes down to around 5.5 milliseconds.
The throughput is relatively stable between all versions of both WfMSs when 50 users are involved (click).
However, a substantial decrease in throughput is observed in the newest version of both WfMS A and WfMS B when the load is raised to 1’000 users (click), and a slight decrease with 500 users (click).
We also show the weighted average BP instance duration for each model class. Using the same instance duration scale for both WfMS does not allow us to see what is going on in WfMS A so we zoom in its scale.
The mentioned increase in average BP instance duration in newer WfMS versions is not caused from one particular BP, but is noticeable in all BPs. However, different models perform differently for the two WfMSs.
While Class 1 (click) is the fastest in WfMS A, in WfMS B Class 4 (click) is the fastest (click).
And while Class 5 (click), the model with the greatest size has the longest duration in WfMS B this is not the case with WfMS A, where Class 3 (click) is the slowest one (click).
Having noticed such differences, we examined the execution data at construct level. WfMS B (click) is on average 7 times slower than WfMS A in executing the call activities. The slower execution of instances in WfMS B also partially results from its slow instance start-up.
WfMS B executes parallel gateways faster than exclusive gateways which might explain the faster execution of Class 2 vs. Class 1.
We also compute resource consumption metrics, based on data with a sampling interval of 1 second. We include the weighted average of CPU, RAM consumption among different trials. The weights are based on the number of CPU and RAM data points in each experiment round.
These metrics verify that the WfMSs’ performance behavior is not caused by lack of resources. In fact, the average CPU consumption is at most 13% across the different experiments, while the maximum (not reported in the graph) is 85% for WfMS B, and 82% for WfMS A. The average RAM consumption is at most app. 10GB, out of the maximum available of 128 GB.
The CPU consumption is comparable between the two systems (click), with slightly higher values for WfMS B especially for 1’000 users for all versions.
For WfMS A the CPU is relatively stable between versions for 50 users, while with 500 and 1000 users improvement has been made compared to v 7.2.0, but 7.3.0 uses the less CPU. The RAM consumption is relatively stable in WfMS A regardless of the number of users.
For WfMS B v 5.18 differs from the others with 50 users, but is one of the best with 500 and 1000 users. In this WfMS both CPU and RAM consumption increase with the number of users. The increase is especially noticeable in the RAM when going from 50 to 500 users.
Newer version does not always mean increased throughput and decreased instance duration. The comparison we made between versions shows the opposite, especially with higher number of users.
So if you care about performance, don’t be in a hurry to get the cool new feature you might not use. Measure and look at the data before making a decision.
In the future we plan to:
Add more WfMSs and with different configurations settings
Obtain more realistic load functions based on event logs
Apply the methodology on domain specific models
Add support for events, human tasks, and WfMS interaction with external systems
Add new metrics of interest, such as DBMS metrics
So the question we have tackled in this presentation is whether updating to the latest WfMS is always a good idea. We did it by proposing a method for deriving representative BPs from company’s collection, applying it on a collection and running experiments on 4 minor versions of two WfMS. The results have shown that performance vise the newest version is not always the best one. So if you are a user be careful when deciding to upgrade. If you are a WfMS vendor, do test the performance before releasing a new version. BenchFlow, the tool used in this paper is an open source tool and we are more than happy to support you in its use.
Thank you for you attention.