Monitoring Business Process Compliance Across Multiple Executions with Stream Processing

Stream Processing
Stream Processing
Chukri Soueidi
Yliès Falcone
Université Grenoble-Alpes
France
Monitoring
Multiple Executions
Process
with
across
Business
Compliance
Monitoring
Multiple Executions
Process
with
across
Business
Compliance
Université du Québec à Chicoutimi
Canada
Sylvain Hallé

C. Soueidi, Y. Falcone, S. Hallé
A school admission process
a. Assign
Oﬃcer
b. Review
Application
c. Request
Documents
d. Schedule
Interview
f. Evaluate
Interview
e. Conduct
Interview
g. Accep-
tance Letter
h. Rejection
Letter
Admission
Process Ends
Application
Submitted

a
b
c
d
f g
e h

a
b
c
d
f g
e h
Executions of this process are called cases
or instances

a
b
c
d
f g
e h
or instances
a
b d
f g
e
c
h

a
b
c
d
f g
e h
or instances
,

a
b
c
d
f g
e h
or instances
, , , ...
,

Process log
These executions can be recorded in the form of
a log

Process log
a log
a g
f
b d e

Process log
a log
a g
f
b d e
a f
b d e
c h

Process log
a log
a g
f
b d e
a f
b d e
c h
a g
f
b d e
c

Process log
a log
a g
f
b d e
a f
b d e
c h
a g
f
b d e
c
a g
f
b d e

Process log
Formally, let...

Process log
Formally, let...
Σ = {σ1, ..., σn} set of events
σ ∈ Σ* is a trace
a f b e
( (
d

Process log
Formally, let...
a f b e
( (
d
C = {c1, ..., cm} set of case
identiﬁers ( (

Process log
Formally, let...
a f b e
( (
d
C = {c1, ..., cm} set of case
identiﬁers ( (
A log is a function λ : C → Σ*, mapping a trace to
each case identiﬁer.
a g
f
b d e
a f
b d e
c h
↦
↦
a g
f
b d e
c
a g
f
b d e
↦
↦

Querying the log
Many questions can be asked about this log:

Querying the log
Does the process
start with ?
a
Does come
after ?
a
b
Does the case end
after fewer than n
events?

Querying the log
Does the process
start with ?
a
Does come
after ?
a
b
Does the case end
after fewer than n
events?
What is the number
of events in a trace?
What are the events
in the trace?
What is the average
delay between two
events?

Querying the log
Does the process
start with ?
a
Does come
after ?
a
b
Does the case end
after fewer than n
events?
What is the number
of events in a trace?
What are the events
in the trace?
What is the average
delay between two
events?
"Constraints"
answer by yes or no
"Queries"
answer by something else

Querying the log
Given a trace σ ∈ Σ*, a query is a function
q : Σ* → D
for some arbitrary image D. A constraint is the
particular case where D = {⊤,⊥}.

Querying the log
q : Σ* → D
A trace σ is compliant with respect to
a constraint q iﬀ q(σ) = ⊤.

Querying the log
q : Σ* → D
A log λ is compliant with respect to a constraint
q iff q(λ(c)) = ⊤ for every case identifier c ∈ C.
A trace σ is compliant with respect to
a constraint q iff q(σ) = ⊤.

Querying the log
A query (and compliance) is concerned with each
execution taken in isolation. Generalize queries to
calculations involving multiple executions:

Querying the log
Are there at least
50% of cases
ending in ?
g
Is the average case
duration below k?

Querying the log
Are there at least
50% of cases
ending in ?
g
Is the average case
duration below k?
What is the maximum
number of concurrent
active cases?
What event occurs in
the fewest cases?

Querying the log
Are there at least
50% of cases
ending in ?
g
Is the average case
duration below k?
What is the maximum
number of concurrent
active cases?
What event occurs in
the fewest cases?
"Hyper-Constraints"
answer by yes or no
"Hyper-Queries"
answer by something else

Hyper-querying the log
Given a log λ ∈ L, a hyper-query is a function
q : L → D
for some arbitrary image D. A hyper-constraint is
the particular case where D = {⊤,⊥}.

Evaluating hyper-queries
Two goals:

Two goals:
A
provide formal means of deﬁning
expressive (hyper) queries
stateful, aggregations, unconventional types

Two goals:
A
provide formal means of deﬁning
expressive (hyper) queries
stateful, aggregations, unconventional types
evaluate these queries incrementally
as the log is produced
P

Incremental evaluation
Case 1
The complete log is available at once; the
hyperquery is evaluated on the log
a g
f
b d e
a f
b d e
c h
a g
f
b d e
c
a g
f
b d e

Case 2
One complete case is processed at a time

Case 2
a g
f
b d e

Case 2
a g
f
b d e
a f
b d e
c h

Case 2
a g
f
b d e
a f
b d e
c h
a g
f
b d e
c

Case 2
a g
f
b d e
a f
b d e
c h
a g
f
b d e
c
a g
f
b d e

Case 3
One event from each case is processed at a time

Case 3
a
a
a
a

Case 3
a
a
a
a
b
b
b
b

Case 3
a
a
a
a
b
b
b
b
d
c
c
d

Case 3
a
a
a
a
b
b
b
b
d
c
c
d
. . .
. . .
. . .
. . .

Case 4
Events from each case are arbitrarily
interleaved

Case 4
interleaved
a

Case 4
interleaved
a
a

Case 4
interleaved
a
a
b

Case 4
interleaved
a
a
b
a

Case 4
interleaved
a
a
b
a
d

Case 4
interleaved
a
a
b
a
d
a

Case 4
interleaved
a
a
b
a
d
a
b

State of the art

State of the art
Constraints Queries
Classical
Hyper

State of the art
Constraints Queries
Classical
Hyper
Temporal Logic(s)
Petri nets
Finite-state machines
A
P
✓
✓
✓

State of the art
Constraints Queries
Classical
Hyper
Temporal Logic(s)
Petri nets
A
P
✓
✓
✓
Stream equations
Stream pipelines
A
P
✓
✓

State of the art
Constraints Queries
Classical
Hyper
Temporal Logic(s)
Petri nets
A
P
✓
✓
✓
Stream equations
Stream pipelines
A
P
✓
✓
Hyper-LTL
HyperLDLf
Event calculus
A
P
±
?
✓

State of the art
Constraints Queries
Classical
Hyper
Temporal Logic(s)
Petri nets
A
P
✓
✓
✓
Stream equations
Stream pipelines
A
P
✓
✓
Hyper-LTL
HyperLDLf
Event calculus
A
P
±
?
✓
SQL (incremental)
APQL
A
P
✓
?

Log updates

Log updates
Log update: a special type of log λ with a single
event for a single case
L ⊆ L is the set of updates
^
{ }
a
↦
^

Log updates
^
{ }
a
↦
^
Update
operation:
an operator ∘ : L × L → L that appends
an event to an existing log
^

Log updates
Log updates
Log updates
^
{ }
a
↦
^
^
{ }
a
↦
^
^
{ }
a
↦
^
Update
operation:
^
Update
operation:
^
Update
operation:
^
{ a
↦
a b
↦ }
∘ { }
b
↦ = { a
↦
a b
↦ }
b
λ ∘ λ
^
=

Log updates
A log is progressvely built through a sequence
(i.e. a stream) of log updates from an empty log

Log updates
∅

Log updates
∅ { }
a
↦
∘

Log updates
∅ { }
a
↦
∘ ∘ { }
a
↦

Log updates
∅ { }
a
↦
∘ ∘ { }
a
↦ ∘ { }
b
↦ ∘ ...

Problem
Problem Given a hyperquery q : L → D, deﬁne a
processor πq : L* → D* such that
πq(λ1 ⋅ ... ⋅ λn)[n] = q(λ1 ∘ ... ∘ λn)
Hyper-processors
πq

Problem
πq(λ1 ⋅ ... ⋅ λn)[n] = q(λ1 ∘ ... ∘ λn)
Hyper-processors
q( )
{ }
a
↦ { }
a
↦
πq

Problem
πq(λ1 ⋅ ... ⋅ λn)[n] = q(λ1 ∘ ... ∘ λn)
Hyper-processors
πq
πq
{ }
a
↦
}
a
↦
{ a
↦
q(

Problem
πq(λ1 ⋅ ... ⋅ λn)[n] = q(λ1 ∘ ... ∘ λn)
Hyper-processors
πq
πq
}
a
↦
{ a
↦
q(
)
{ }
b
↦ b

Problem
πq(λ1 ⋅ ... ⋅ λn)[n] = q(λ1 ∘ ... ∘ λn)
Hyper-processors
πq
πq
}
a
↦
{ a
↦
q(
)
{ }
b
↦ b
In other words, πq incrementally evaluates q on
each update
Avoid re-evaluating q from scratch every time!

Solution
Solution Follow a compositional approach
Hyper-processors

Solution
Hyper-processors
elementary (incremental)
hyperquery processors

Solution
Hyper-processors
elementary (incremental)
hyperquery processors
allow composition to express
complex hyperqueries

Solution
Hyper-processors
Solution
Operators on log updates
Log combinations
Qualiﬁed conditions

XES Source
Reads a log ﬁle in XES format, and outputs it as an
interleaved stream of log updates.

XES Source
a g
f
b d e
a f
b d e
c h
↦
↦
a g
f
b d e
c
a g
f
b d e
↦
↦

XES Source
1
2
3
4
5
a g
f
b d e
a f
b d e
c h
↦
↦
a g
f
b d e
c
a g
f
b d e
↦
↦

XES Source
1
2
3
4
5
a g
f
b d e
a f
b d e
c h
↦
↦
a g
f
b d e
c
a g
f
b d e
↦
↦
{ }
a
↦

XES Source
1
2
3
4
5
a g
f
b d e
a f
b d e
c h
↦
↦
a g
f
b d e
c
a g
f
b d e
↦
↦
{ }
b
↦

Sample
Evaluate condition f on ﬁrst event σ of each case;
retain case in log if f(σ) = ⊤.
f

Sample
=?
a

Sample
=?
a
{ }
a
↦

Sample
=?
a
{ }
a
↦ { }
a
↦

Sample
=?
a
{ }
b
↦

Sample
=?
a
{ }
b
↦ { }
b
↦

Filter
Run processor P on each case; only output updates
for cases where P returns ⊤.
P

Filter
1
Σ
0
+
f
≥
2
there are more
than 2 events

Filter
1
Σ
0
+
f
≥
2
{ }
a
↦

Filter
1
Σ
0
+
f
≥
2

Filter
1
Σ
0
+
f
≥
2
{ }
b
↦

Filter
1
Σ
0
+
f
≥
2
{ }
b
↦ { }
a
↦ b

Filter
1
Σ
0
+
f
≥
2
{ }
b
↦ { }
a
↦ b
{ }
a
↦ b
when the condition becomes ⊤,
all the updates from the start of
the case are output

Filter
1
Σ
0
+
f
≥
2
{ }
c
↦

Filter
1
Σ
0
+
f
≥
2
{ }
c
↦ { }
c
↦

Filter
1
Σ
0
+
f
≥
2
{ }
c
↦ { }
c
↦
subsequent updates are let
through

Slice
Run an instance of processor P on each case; output
resulting log as a new stream of updates.
P

Slice
1
f

Slice
1
f
put two successive
events in a tuple

Slice
1
f
{ }
a
↦

Slice
1
f
{ }
b
↦

Slice
1
f
{ }
b
↦
a b

Slice
1
f
a b
{ }
↦
{ }
b
↦
a b

Slice
1
f
a b

Slice
1
f
{ }
a
↦
a b

Slice
1
f
{ }
a
↦
a b ba

Slice
1
f
{ }
ba
↦
{ }
a
↦
a b ba

Slice
1
f
a b ba

Slice
1
f
{ }
c
↦
a b ba

Slice
1
f
{ }
c
↦
a b
bc ba

Slice
1
f
{ }
bc
↦
{ }
c
↦
a b
bc ba

Slice
1
f
bc ba

Slice
1
f
{ }
b
↦
bc ba

Quantification
Given quantifiers Q1σ1, ..., Qnσn on cases,
evaluate quantified expression P(σ1, ..., σn) for a
processor P.
Log combinations
Q
∀ ∃
...
P

Quantiﬁcation
processor P.
Log combinations
Q
∀
∃
↑
↑ f
=?

Quantiﬁcation
processor P.
Log combinations
Q
∀
∃
↑
↑ f
=? for every case,
there exists another
one that starts with
the same event

Quantiﬁcation
processor P.
Log combinations
Q
∀
∃
↑
↑ f
=?
{ }
a
↦

Quantiﬁcation
processor P.
Log combinations
Q
∀
∃
↑
↑ f
=?
{ }
a
↦ ⊥

Quantiﬁcation
processor P.
Log combinations
Q
∀
∃
↑
↑ f
=?
{ }
b
↦

Quantiﬁcation
processor P.
Log combinations
Q
∀
∃
↑
↑ f
=?
⊥
{ }
b
↦

Quantiﬁcation
processor P.
Log combinations
Q
∀
∃
↑
↑ f
=?
⊤
{ }
a
↦

Aggregation
Run processor P on each case and aggregate their
last output with function processor P'.
Log combinations
σ Σ
P P'

Aggregation
Log combinations
σ Σ
P P'
aggregate output
run on each case

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1
count distinct events running average

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1
{ }
a
↦

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1
{ }
a
↦
1

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1
{ }
a
↦ 1
1

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1
1

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1
1
{ }
b
↦

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1
1 1
{ }
b
↦

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1
1
1 1
{ }
b
↦

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1
1 1

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1
1
2 1
{ }
b
↦

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1
1.5
1
2 1
{ }
b
↦

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1
2 1

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1
2 1
{ }
b
↦

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1
2 1 1
{ }
b
↦

Aggregation
Log combinations
σ Σ
}
{
| |
f
f
Σ
0
+
Σ
0
+
÷
1
1.3
2 1 1
{ }
b
↦

Weakening
Consider the verdict of a hyper-constraint ψ only
for logs that satisfy another hyper-constraint φ.
Qualified conditions
φ ψ
P P'
↑

Weakening
φ ψ
↑
f
Σ
0
+
Σ
0
+
÷
1
↑
1
σ Σ
f
k
≤
1
σ Σ
Σ
0
+
f
5
≤
↑

Weakening
φ ψ
↑
f
Σ
0
+
Σ
0
+
÷
1
↑
1
σ Σ
f
k
≤
1
σ Σ
Σ
0
+
f
5
≤
↑
the average case
length does not
exceed k

Weakening
φ ψ
↑
f
Σ
0
+
Σ
0
+
÷
1
↑
1
σ Σ
f
k
≤
1
σ Σ
Σ
0
+
f
5
≤
↑
on logs having
at least 5 cases

Is it "just" a logical implication?
↑
φ ψ

↑
φ ψ
1. Separates the policy (ψ) from the
condition under which it applies (φ)

↑
φ ψ
1. Separates the policy (ψ) from the
condition under which it applies (φ)
2. Three-valued implication: distinguishes
between a non-violation caused by
satisfaction (φ = ⊤, ψ = ⊤)
exemption (φ = ?, ψ = *)
⊤
?
⊤
↑
⊤
?
⊤
?
⊤
⊤ ⊤ ⊤
⊤
⊤
? ?
⊤

Dampening
Tolerate violations of a hyper-constraint; either...
n
m

Dampening
n
m
temporary violations in a case (e.g. at most m
out of n successive events)

Dampening
n
m
⊤ ⊤
⊤ ⊤
⊤
⊤
⊤ ⊤ ⊤ ⊤
⊤
⊤ ⊤

Dampening
n
m
⊤ ⊤
⊤ ⊤
⊤
⊤
⊤ ⊤ ⊤ ⊤
⊤
⊤ ⊤
m=3, n=4

violations of a fraction of all cases (e.g. at
most m
/n of all cases)
n
m
Dampening

most m
/n of all cases)
n
m
Dampening
m=3, n=4

most m
/n of all cases)
n
m
Dampening
m=3, n=4
↦
↦
↦
↦
⊤
⊤
⊤
⊤
⊤

These "building blocks" have been implemented
as an extension of the event stream
processing library
Implementation
https://github.com/liﬂab/hypercompliance
They can be freely mixed with other processors
from the library to form complex hyper-queries

Example
"All cases that contain an a must end in the
same state" (consistency)
f
F
=?
↑
1 a
σ Σ
}
{
f
#
1 f
≤

Example
"The number of tasks of an employee may not
exceed the global average by a factor k."
f
emp
f
*
#
*
}
{
}
{
f
Σ
max
f
Σ
0
+
Σ
0
+
÷
1
≤
×
2 k
1

Experimental results
Scenario Events Cases Hyperquery Throughput (Hz) Max memory (B)
Hospital 151434 1143
Concurrent instances 901392 Hz 8059
Directly follows 277351 Hz 3660379
Mean time interval 1130104 Hz 5615
Average length 369351 Hz 522071
Same next 304084 Hz 5401821
CAP 275287 13087
Same next 87698 Hz 38400801
WABO 39881 937
Same next 302128 Hz 2669627
Throughput and memory consumption for a sample
of properties evaluated on real-life logs
(details in the paper)

Conclusion
Hyper-contraints and hyper-queries are calculated
on a set of process executions called a log
Elementary computing units (processors) can
incrementally evaluate hyper-queries in an
incremental (i.e. "real-time") fashion
Expressive hyper-queries can be obtained by
composing these units
These concepts have been...
1. formally deﬁned
2. concretely implemented in a stream
processing library

What's next ?
Journal paper in preparation, considering a
generalization of a hyperquery:
q : L → D q : L* → D
vs.
^
i.e. q depends on the log and the precise
interleavings of its updates.
^
^

What's next ?
Journal paper in preparation, considering a
generalization of a hyperquery:
q : L → D q : L* → D
vs.
^
i.e. q depends on the log and the precise
interleavings of its updates.
^
^
Thank you!

Monitoring Business Process Compliance Across Multiple Executions with Stream Processing

Recommended

Recommended

More Related Content

More from Sylvain Hallé

More from Sylvain Hallé (20)

Recently uploaded

Recently uploaded (20)

Monitoring Business Process Compliance Across Multiple Executions with Stream Processing