This document discusses a formal specification for event processing called LEAD. It proposes new operators and a rule grammar to address challenges in complex event processing (CEP) like performance, scalability, state management, ambiguous semantics, and lack of expressiveness. The key contributions are a pattern algebra extending common CEP operators, a rule grammar to define patterns and obtain actions, and a novel logical execution plan using timed colored Petri nets to facilitate deployment. An example use case of product roll-up tracking is presented to illustrate how LEAD can formulate a problem with interdependent patterns in fewer queries compared to other CEP frameworks.
3. We want to have a positive impact
on the world.
We believe technology is the engine
of change. To embrace the change
and its possibilities, we believe that
we have to stay at the edge of the
knowledge. More : we believe
creating new knowledge is the best
way to see further and to lead the
path to tomorrow.
So, the positive impact on the world
is led by a sound and cutting-edge
knowledge of technology.
Vision
Impacting the society with technologies
3
4. We develop technologies
for clients and we
implement the solution
they need. We commit on
the results & deliveries.
We bring our cross-domain
knowledge through our
experts to deliver more
value to our clients. On
request of the client, we help
them define strategic
opportunities.
CRAFT
SERVE
EXPLORE
We work on the most
promising IT technologies to
share our R&D knowledge &
build dedicated training
sessions delivered by
renowned experts
OUR MISSION
4
8. A PRIVATE R&D CENTER
36 SCIENTIFIC PUBLICATIONS SINCE 2008
● 4th Workshop IEEE CA USA, on Real-time and Stream Analytics in Big Data &
Stream Data Management
● Anas Al Bassit, Skhiri Sabri, LEAD: A Formal Specification For Event
Processing, in 13Th ACM international Conference on distributed and
event-based systems 2019
● Katsiaryna Krasnashchok, Aymen Cherif, Coherence Regularization for Neural
Topic Models. in 16th International Symposium on Neural Networks 2019
(ISNN 2019)
● Aymen Cherif, Salim Jouili, Pairwise Image Ranking with Deep Comparative
Network. ESANN 2018: ES2018-200
● Qile Zhu, Xiaolin Li, Ana Conesa, Cécile Pereira, GRAM-CNN: A deep learning
approach with local context for named entity recognition in biomedical text,
Bioinformatics – May 2018
● Katsiaryna Krasnashchok, Salim Jouili, Improving Topic Quality by Promoting
Named • Entities in Topic Modeling, Proceedings of the 56th Annual Meeting
of the Association for Computational Linguistics (Volume 2: Short Papers).
Vol. 2. 2018
● Amine Ghrab, Oscar Romero, Salim Jouili, Sabri Skhiri, Graph BI & Analytics:
Current State and Future Challenges. DaWaK 2018: 3-18
● De Visscher, I.; Stempfel, G.; Rooseleer, F. & Treve, V.; Data mining and
Machine Learning techniques supporting Time-Based Separation concept
deployment, in 37th Digital Avionics Systems Conference (DASC), pp 594-603,
London, UK, September 23-27, 2018
R&D
expertise
8 8
12. Our Research Tracks
Jericho and Asgard
2017 2019
3 Tracks
ELI: Elastic & Large Indexing
ECCO: Easy Cluster Configuration
Optimization
LEAD: Live Event Analysis &
Detection
2022
ASGARDJERICHO
13. JERICHO
ELI
ELI’s GOAL
Extract relevant information from data lakes containing large
volumes of heterogeneous data
Topic Modeling
Embeddings
Improvements
Learning to
Rank
Aymen Cherif, Salim Jouili, Image retrieval and ranking through Deep Comparative Neural
Networks. ESANN 2018: ES2018-200
Katsiaryna Krasnashchok, Salim Jouili, Improving Topic Quality by Promoting Named Entities in
Topic Modeling, Proceedings of the 56th
Luca De Petris, Aymen Cherif, LSTM Siamese Network for Question Answering System, SLSP
2018
Katsiaryna Krasnashchok, Salim Jouili, Hierarchical Attention-Based Neural Topic Model, SLSP
2018
Katsiaryna Krasnashchok, Aymen Cherif, Coherence Regularization for Neural Topic Models. 16th
International Symposium on Neural Networks 2019 (ISNN 2019)
14. JERICHO
ECCO
ECCO’s GOAL
Assist users in the configuration & optimisation of their Big Data
frameworks
Select parameters for
performance prediction
and performance
optimization
Given:
● the infrastructure
constraints
● a specific application (job)
● a specific input dataset.
We build an optimiser that
searches for the best parameters
Muaz Twaty, Amine Ghrab, Sabri Skhiri, GraphOpt: a Framework for Automatic Parameters
Tuning of Graph Processing Frameworks EEE International Workshop on Benchmarking,
Performance Tuning and Optimization for Big Data Applications (BPOD 2019)
15. JERICHO
LEAD
LEAD’s GOAL
Allow users to benefit from stream analytics in a large variety of
domains
Stream 1 Stream 2
Join
Context-aware
application
Stream 3
Followed
by
Detects patterns
Anas Al Bassit, Skhiri Sabri, LEAD: A Formal Specification For Event Processing, 13Th
ACM international Conference on distributed and event-based systems 2019
16. Our Research Tracks
Jericho and Asgard
2017 2019 2022
ASGARD
MJOLNIR
RUNE
YGGDRASIL
VAGGDELMIR
JERICHO
4 Tracks
19. ASGARD program
AI & ML in industries
THE NEW WALL
CONNECTDATA Business needs
20. ASGARD program
AI & ML in industries
THE NEW WALL
CONNECTDATA Business needs
21. THE COST OF
COMPLEXITY
40%
of business initiatives do not
achieve their objectives because
of the quality/governance of the
data
The minimum cost of an AI/Big Data use case
500,000 - 1 Mi
700,000 positions
are waiting to be filled
+
GDPR compliance costs
$1 to $5 MILLIONS
ASGARD
What kind of barrier ?
70% Of business leaders are worried about the lack
of data scientists in the next 2Y
24. Introduction
24
What is Complex Event Processing?
Systems that are able to detect interesting situations by correlating events from
different streams, transforming and aggregating them, and then generating actions
are referred to as CEP engines
25. Introduction
25
What is Complex Event Processing?
“A Manchester fan is moving far from
home to see a match”
“Fan” Fact: location & context
Localised more than x times in the last 3
mo at Manchester events
Fan Moving: Now, there is a Manchester Match at
Chelsea & a Manchester fan is located there
30. CEP Challenges
Technical
● Performance(Throughput & Latency)
● Scalability
● State management
30
Logical
● Ambiguous Semantics (Absence of formalisms
and Selection & Consumption policies)
● Lack of Expressiveness and User-friendliness
● Missing operators (Negations, Sequences,
Repetitions … etc)
31. Motivation
31
Product Roll-up Tracking
Four streams of events:
● installations,
● accesses,
● artifacts bought,
● connect,
● and shares;
How to identify good customers?
How to measure the success of the
application in real time?
32. Motivation
We assume the following four actions per each user and game and within the first 3 days from
installation:
32
Product Roll-up Tracking
3. Middle-success (M)
≥3, and not (S) nor (L)
4. Failure (F)
≤2, 0, 0
1. Success (S)
≥5, ≥2, ≥2
2. Middle-success & Leaving (L)
≥3 and ≤5, 0, 0 and the user did not connect within 2 days
after the last access
● installations,
● accesses,
● artifacts bought,
● connect,
● and shares;
33. Motivation
33
Product Roll-up Tracking
There is no CEP framework capable of formulating this problem with less than four queries,
although the patterns are similar to each other and have inter-dependencies.
We assume the following four actions per each user and game and within the first 3 days from
installation:
3. Middle-success (M)
≥3, and not (S) nor (L)
4. Failure (F)
≤2, 0, 0
1. Success (S)
≥5, ≥2, ≥2
2. Middle-success & Leaving (L)
≥3 and ≤5, 0, 0 and the user did not connect within 2 days
after the last access
● installations,
● accesses,
● artifacts bought,
● connect,
● and shares;
34. Contributions
34
A pattern algebra that extends the common set of operators in CEP, and defines them
formally using TRIO [1, 2], a logic-based specification language aggrandized with temporal
features
A rule grammar that, using our pattern algebra, allows users to obtain different kinds of
actions, depending on the characteristics of a matched pattern
A novel logical execution plan created based on a combination of timed colored petri nets
with aging tokens [3] and prioritized petri nets [4], that we believe will facilitate the
deployment of this plan in the future.
1
2
3
37. Pattern Model
Access (GID: 123, UID: 321): 999
37
Event Representation & Formal Definitions
Event Type
Attributes
Values
Event Time
Sequence Operator Repetition Operator
38. Pattern Model
38
Basic Operators:
● Renaming
● Filtering
LEAD Operators
Core Operators:
● Conjunction
● Disjunction
● Negation
● Sequence
● Repetition
● Subcontext
Temporal Constraints
● Within
● Wait
Selection & Consumption Policies:
● First
● Last
● Adjacent
● Every
● All
● All … Consume
● Repetition Max
● Repetition Min
39. Pattern Model
39
Context and Sub-context
time
Context
installed installed + 3 days || ac::6
ac::1 ac::2
Middle-success & Leaving (L)
● 3≤ accesses ≤5
● The user did not connect within 2 days after the last access
40. Pattern Model
40
Context and Sub-context
ac::3 time
Context
Sub-context
installed
ac::1 ac::2 ac::3 + 2 days
installed + 3 days || ac::6
Middle-success & Leaving (L)
● 3≤ accesses ≤5
● The user did not connect within 2 days after the last access
41. Pattern Model
41
Context and Sub-context
ac::3 time
Context
Sub-context
installed
ac::1 ac::2 ac::4 + 2 daysac::4
installed + 3 days || ac::6
Middle-success & Leaving (L)
● 3≤ accesses ≤5
● The user did not connect within 2 days after the last access
45. Let’s come back
45
Product Roll-up Tracking
There is no CEP framework capable of formulating this problem with less than four queries,
although the patterns are similar to each other and have inter-dependencies.
We assume the following four actions per each user and game and within the first 3 days from
installation:
3. Middle-success (M)
≥3, and not (S) nor (L)
4. Failure (F)
≤2, 0, 0
1. Success (S)
≥5, ≥2, ≥2
2. Middle-success & Leaving (L)
≥3 and ≤5, 0, 0 and the user did not connect within 2 days
after the last access
● installations,
● accesses,
● artifacts bought,
● connect,
● and shares;
46. Let’s come back
46
Product Roll-up Tracking
● installations,
● accesses,
● artifacts bought,
● connect,
● and shares;
≥5, ≥2, ≥2
S
≥3 ≤5, 0, 0
No con. 2 days
after the last
access
L
≥3,
not(S) nor (L)
M
≤2, 0, 0
F
time
Context
installed installed + 3 days || ac::6
≤2
≥3
56. LOGICAL EXECUTION PLAN
56
Why Petri Nets?
Concurrency &
Synchronization
Places, Transitions,
Edges and Tokens
Probabilistic CEP
57. LOGICAL EXECUTION PLAN
A Petri net consists of places, transitions, and arcs. Arcs run from a place to a transition or vice versa, never between places or
between transitions.
57
Petri Net
https://en.wikipedia.org/wiki/Petri_net
Coloured Petri nets allow tokens to have a data value attached to them. This attached data value is called the token color.
Age can be used to invalidate token over time
58. LOGICAL EXECUTION PLAN
N = (Σ, P, I, IC, OC, TT, 𝛑, IT, G, r0
)
Σ: is a finite set of types (colours),Σ ⊆ E[n]
, n ∈ N;
P≣ [p1
, p2
,..., p|P|
]: is a finite set of places, which can be either stateless, i.e. they pass tokens between transitions, or stateful,
i.e. they preserve tokens in ordered structures;
I: is a finite set of transitions . Transitions are either temporal guards, consumers or intermediate transitions;
IC ⊆ (P x I): is a finite non-empty set of input arcs;
OC ⊆ (I x P): is a finite non-empty set of output arcs;
TT: P ⇒ Σ: is a color function, where each place has a single type that belongs to Σ, and all the tokens on this place must be
of the same type;
𝛑: IC ⇒ NO
is a priority function;
IT: I ⇒ R is a time expression function;
G: I ⇒ boolean is a guard function that maps each transition i ∈ I to a boolean expression over all the incoming arcs IC(i) ⊆
IC;
r0
∈ R is an initial marking from the set of all markings R.
58
Aging tokens Prioritized Colored Petri Net Definition
60. LOGICAL EXECUTION PLAN
60
LEAD Rules in APCPN
LEAD Rule in APCPN
1 type of token
Every place has 1 type
(union of input token
type)
61. LOGICAL EXECUTION PLAN
61
LEAD Rules in APCPN
LEAD Rule in APCPN
Source Pattern Compact version
Priorities to transitions
62. Within Operator:
A within 10s from B
(B matched before A)
Sequence Operator:
A followed by B
LOGICAL EXECUTION PLAN
62
LEAD Rules in APCPN
Two forms of sequencing events
63. Within Operator:
A within 10s from B
(B matched before A)
Sequence Operator:
A followed by B
LOGICAL EXECUTION PLAN
63
LEAD Rules in APCPN
Two forms of sequencing events
past (match), stateful
match Now, stateless
64. Within Operator:
A within 10s from B
(B matched before A)
Sequence Operator:
A followed by B
LOGICAL EXECUTION PLAN
64
LEAD Rules in APCPN
Two forms of sequencing events
Aging tokens for
temporal constraints
68. Streaming job
68
Keyed Stream
● Key preserving stream (avoiding re-shuffling)
● Kind of hard coded optimization: to ensure that Flink does not reshuffle keys between
operators (already on the task manager)
Stream functions
● Enable
● And
● OR
● ...
The basic foundation: the keyed stream processors & function library
69. The activation challenge
69
How to feedback the activation on a Stream based on a current event?
Sequence Operator:
A followed by B: The activation of
B starts only when A is received.
A
B
OP OP
OP
You can start processing B
OP
70. LEAD Stream Operator
The activation challenge
70
How to feedback the activation on a Stream based on a current event?
A
B
OP OP
OP
You can start processing B
Stream Function 2
Each transition in the ACPN becomes a function
● The functions are stored in the state key store
● They can be activated/deactivated by listening control events coming from the “Context”
OP
71. LEAD Stream Operator
Modeling ACPN choice
71
Grouping the functions only when a transition faces a choice
Stream Function 1
H
L
LL
Petri net prioritized choice
LEAD Stream Operator
Stream Function 1
Stream Function 2
Stream Function 3
LEAD Stream Operator
Stream Function 2
A
B
A
B
72. 72
The context - the control plan
An iterative Stream simulating a pub sub
Context
LEAD Stream Operator
Stream Function 1
LEAD Stream Operator
Stream Function 2
LEAD Stream Operator
Stream Function 3
JOIN process broadcast
73. Streaming job
73
What logic do functions execute
Sequence Operator:
A followed by B: The activation of
B starts only when A is received.
A
B
Source Operator
A Source Function
1- Get an A event
2- Enable “B Source Function”
Source Operator
Get Control Events
Enable “B Source Function”
B Source Function
1- Get a B event
Sequence Operator
Sequence Function
1- Get an event
2- if (event == A)
2.1- Store in State
if (event == B and !State.isEmpty)
2.1- Produce a sequence event
A --→ B
Garbage Collector Function
1- Discard a collected event B
Data Flow
Control Flow
74. Challenges to tackle
74
1. Control Stream: How to ensure that the control stream is faster than data stream and
to avoid race conditions ?
2. Matching per partition: is there any case where we need to match cross partitions ?
3. Finishing the end to end code:
a. Petri 2 Flink compiler
b. End to end functional and performance tests on distributed env.
75. Summary
75
● Both technical and logical challenges were the reasons behind LEAD;
● 18 operators were introduced and formalized using TRIO trying to eliminate ambiguous behaviours;
● The decent set of operators and extending the capabilities of the query language were meant to increase
the expressive power in CEP;
● Aging tokens prioritized colored petri nets, as a logical execution gives a room for optimisations;
Future Work
● Query optimizations on both logical and physical levels
● Benchmarking the performance of LEAD CEP
● Probabilistic CEP
76. Open Sourcing Dev
76
Preparing a Flink integration proposal and a merge request
Thinking about integration the Stream SQL API as Match recognise
Why not a KStream implementation ?
79. ● A peer-reviewed open access journal
on data in science, with the aim of
enhancing data transparency and
reusability.
● Publishes in two sections:
○ A section on the collection,
treatment and analysis methods
of data in science;
○ A section publishing descriptions
of scientific and scholarly datasets
(one dataset per paper).
Open Access; Included in Scopus, Emerging
Sources Citation Index (ESCI - Web of
Science), DBLP and Inspec
Over 500 authors, reviewers and editors involved
in 2018; Free language check service after paper
accepted
Peer-reviewed, 16.7 days for a first decision;
2.8 days from acceptance to publication
(median processing time in first half 2019)
No space constraints, no extra
space or color charges
data@mdpi.com
www.mdpi.com/journal/data