Anas Al Bassit, R&D@EURANOVA Sabri Skhiri, R&D@EURANOVA
1
LEAD: A Formal Specification For
Event Processing
EURA NOVA: A DATA COMPANY
We want to have a positive impact
on the world.
We believe technology is the engine
of change. To embrace the change
and its possibilities, we believe that
we have to stay at the edge of the
knowledge. More : we believe
creating new knowledge is the best
way to see further and to lead the
path to tomorrow.
So, the positive impact on the world
is led by a sound and cutting-edge
knowledge of technology.
Vision
Impacting the society with technologies
3
We develop technologies
for clients and we
implement the solution
they need. We commit on
the results & deliveries.
We bring our cross-domain
knowledge through our
experts to deliver more
value to our clients. On
request of the client, we help
them define strategic
opportunities.
CRAFT
SERVE
EXPLORE
We work on the most
promising IT technologies to
share our R&D knowledge &
build dedicated training
sessions delivered by
renowned experts
OUR MISSION
4
EXPERTS IN
DATA STRATEGY, DATA
ENGINEERING, DATA
ARCHITECTURE, DATA SCIENCE,
GRAPHS
5
Customers’
challenges
R&D
expertise
Products
A UNIQUE AND AMBITIOUS
BUSINESS MODEL
6
HR
Automotive
Bank &
Insurance
MORE THAN 100 USE CASESCustomers’
challenges
Media
Aviation
Telco
RetailPharma
Cross
pollinating
expertise
7
A PRIVATE R&D CENTER
36 SCIENTIFIC PUBLICATIONS SINCE 2008
● 4th Workshop IEEE CA USA, on Real-time and Stream Analytics in Big Data &
Stream Data Management
● Anas Al Bassit, Skhiri Sabri, LEAD: A Formal Specification For Event
Processing, in 13Th ACM international Conference on distributed and
event-based systems 2019
● Katsiaryna Krasnashchok, Aymen Cherif, Coherence Regularization for Neural
Topic Models. in 16th International Symposium on Neural Networks 2019
(ISNN 2019)
● Aymen Cherif, Salim Jouili, Pairwise Image Ranking with Deep Comparative
Network. ESANN 2018: ES2018-200
● Qile Zhu, Xiaolin Li, Ana Conesa, Cécile Pereira, GRAM-CNN: A deep learning
approach with local context for named entity recognition in biomedical text,
Bioinformatics – May 2018
● Katsiaryna Krasnashchok, Salim Jouili, Improving Topic Quality by Promoting
Named • Entities in Topic Modeling, Proceedings of the 56th Annual Meeting
of the Association for Computational Linguistics (Volume 2: Short Papers).
Vol. 2. 2018
● Amine Ghrab, Oscar Romero, Salim Jouili, Sabri Skhiri, Graph BI & Analytics:
Current State and Future Challenges. DaWaK 2018: 3-18
● De Visscher, I.; Stempfel, G.; Rooseleer, F. & Treve, V.; Data mining and
Machine Learning techniques supporting Time-Based Separation concept
deployment, in 37th Digital Avionics Systems Conference (DASC), pp 594-603,
London, UK, September 23-27, 2018
R&D
expertise
8 8
A PRODUCT FACTORY
R&D
expertise
Cutting-edge
solutions
Customers’
challenges
9
OUR RESEARCH TRACKS
Our Research Tracks
Jericho and Asgard
2017 2019
ASGARDJERICHO
2022
Our Research Tracks
Jericho and Asgard
2017 2019
3 Tracks
ELI: Elastic & Large Indexing
ECCO: Easy Cluster Configuration
Optimization
LEAD: Live Event Analysis &
Detection
2022
ASGARDJERICHO
JERICHO
ELI
ELI’s GOAL
Extract relevant information from data lakes containing large
volumes of heterogeneous data
Topic Modeling
Embeddings
Improvements
Learning to
Rank
Aymen Cherif, Salim Jouili, Image retrieval and ranking through Deep Comparative Neural
Networks. ESANN 2018: ES2018-200
Katsiaryna Krasnashchok, Salim Jouili, Improving Topic Quality by Promoting Named Entities in
Topic Modeling, Proceedings of the 56th
Luca De Petris, Aymen Cherif, LSTM Siamese Network for Question Answering System, SLSP
2018
Katsiaryna Krasnashchok, Salim Jouili, Hierarchical Attention-Based Neural Topic Model, SLSP
2018
Katsiaryna Krasnashchok, Aymen Cherif, Coherence Regularization for Neural Topic Models. 16th
International Symposium on Neural Networks 2019 (ISNN 2019)
JERICHO
ECCO
ECCO’s GOAL
Assist users in the configuration & optimisation of their Big Data
frameworks
Select parameters for
performance prediction
and performance
optimization
Given:
● the infrastructure
constraints
● a specific application (job)
● a specific input dataset.
We build an optimiser that
searches for the best parameters
Muaz Twaty, Amine Ghrab, Sabri Skhiri, GraphOpt: a Framework for Automatic Parameters
Tuning of Graph Processing Frameworks EEE International Workshop on Benchmarking,
Performance Tuning and Optimization for Big Data Applications (BPOD 2019)
JERICHO
LEAD
LEAD’s GOAL
Allow users to benefit from stream analytics in a large variety of
domains
Stream 1 Stream 2
Join
Context-aware
application
Stream 3
Followed
by
Detects patterns
Anas Al Bassit, Skhiri Sabri, LEAD: A Formal Specification For Event Processing, 13Th
ACM international Conference on distributed and event-based systems 2019
Our Research Tracks
Jericho and Asgard
2017 2019 2022
ASGARD
MJOLNIR
RUNE
YGGDRASIL
VAGGDELMIR
JERICHO
4 Tracks
ASGARD
Introduction
ASGARD program
ASGARD program
ASGARD Wall - An absolute barrier
Introduction
ASGARD program
AI & ML in industries
THE NEW WALL
CONNECTDATA Business needs
ASGARD program
AI & ML in industries
THE NEW WALL
CONNECTDATA Business needs
THE COST OF
COMPLEXITY
40%
of business initiatives do not
achieve their objectives because
of the quality/governance of the
data
The minimum cost of an AI/Big Data use case
500,000 - 1 Mi
700,000 positions
are waiting to be filled
+
GDPR compliance costs
$1 to $5 MILLIONS
ASGARD
What kind of barrier ?
70% Of business leaders are worried about the lack
of data scientists in the next 2Y
LEAD: INTRODUCTION
Introduction
23
This presentation is based on the paper published in DEBS
Introduction
24
What is Complex Event Processing?
Systems that are able to detect interesting situations by correlating events from
different streams, transforming and aggregating them, and then generating actions
are referred to as CEP engines
Introduction
25
What is Complex Event Processing?
“A Manchester fan is moving far from
home to see a match”
“Fan” Fact: location & context
Localised more than x times in the last 3
mo at Manchester events
Fan Moving: Now, there is a Manchester Match at
Chelsea & a Manchester fan is located there
Introduction
26
What is Complex Event Processing?
“A goal is scored by Manchester”
Introduction
27
What is Complex Event Processing?
“Fans will need a last-minute hotel”
ACTION: Send ads from the hotel nearby
Introduction
28
Applications
CEP
Fraud detection
Security Information
and Event Management
Quality of Experience Business Activity
Monitoring
Event-driven
marketing
Multimodal Surveillance
CEP Challenges
29
Technical
● Performance(Throughput & Latency)
● Scalability
● State management
CEP Challenges
Technical
● Performance(Throughput & Latency)
● Scalability
● State management
30
Logical
● Ambiguous Semantics (Absence of formalisms
and Selection & Consumption policies)
● Lack of Expressiveness and User-friendliness
● Missing operators (Negations, Sequences,
Repetitions … etc)
Motivation
31
Product Roll-up Tracking
Four streams of events:
● installations,
● accesses,
● artifacts bought,
● connect,
● and shares;
How to identify good customers?
How to measure the success of the
application in real time?
Motivation
We assume the following four actions per each user and game and within the first 3 days from
installation:
32
Product Roll-up Tracking
3. Middle-success (M)
≥3, and not (S) nor (L)
4. Failure (F)
≤2, 0, 0
1. Success (S)
≥5, ≥2, ≥2
2. Middle-success & Leaving (L)
≥3 and ≤5, 0, 0 and the user did not connect within 2 days
after the last access
● installations,
● accesses,
● artifacts bought,
● connect,
● and shares;
Motivation
33
Product Roll-up Tracking
There is no CEP framework capable of formulating this problem with less than four queries,
although the patterns are similar to each other and have inter-dependencies.
We assume the following four actions per each user and game and within the first 3 days from
installation:
3. Middle-success (M)
≥3, and not (S) nor (L)
4. Failure (F)
≤2, 0, 0
1. Success (S)
≥5, ≥2, ≥2
2. Middle-success & Leaving (L)
≥3 and ≤5, 0, 0 and the user did not connect within 2 days
after the last access
● installations,
● accesses,
● artifacts bought,
● connect,
● and shares;
Contributions
34
A pattern algebra that extends the common set of operators in CEP, and defines them
formally using TRIO [1, 2], a logic-based specification language aggrandized with temporal
features
A rule grammar that, using our pattern algebra, allows users to obtain different kinds of
actions, depending on the characteristics of a matched pattern
A novel logical execution plan created based on a combination of timed colored petri nets
with aging tokens [3] and prioritized petri nets [4], that we believe will facilitate the
deployment of this plan in the future.
1
2
3
Roadmap
35
1 3
5
42
THE
PROBLEM
THE
OPERATORS
THE
GRAMMAR
THE LOGICAL
EXEC PLAN
THE JOB
CEP: THE OPERATORS
Pattern Model
Access (GID: 123, UID: 321): 999
37
Event Representation & Formal Definitions
Event Type
Attributes
Values
Event Time
Sequence Operator Repetition Operator
Pattern Model
38
Basic Operators:
● Renaming
● Filtering
LEAD Operators
Core Operators:
● Conjunction
● Disjunction
● Negation
● Sequence
● Repetition
● Subcontext
Temporal Constraints
● Within
● Wait
Selection & Consumption Policies:
● First
● Last
● Adjacent
● Every
● All
● All … Consume
● Repetition Max
● Repetition Min
Pattern Model
39
Context and Sub-context
time
Context
installed installed + 3 days || ac::6
ac::1 ac::2
Middle-success & Leaving (L)
● 3≤ accesses ≤5
● The user did not connect within 2 days after the last access
Pattern Model
40
Context and Sub-context
ac::3 time
Context
Sub-context
installed
ac::1 ac::2 ac::3 + 2 days
installed + 3 days || ac::6
Middle-success & Leaving (L)
● 3≤ accesses ≤5
● The user did not connect within 2 days after the last access
Pattern Model
41
Context and Sub-context
ac::3 time
Context
Sub-context
installed
ac::1 ac::2 ac::4 + 2 daysac::4
installed + 3 days || ac::6
Middle-success & Leaving (L)
● 3≤ accesses ≤5
● The user did not connect within 2 days after the last access
CEP: THE RULE GRAMMAR
Rule Grammar
43
Grammar
Rule Grammar
44
Grammar
Let’s come back
45
Product Roll-up Tracking
There is no CEP framework capable of formulating this problem with less than four queries,
although the patterns are similar to each other and have inter-dependencies.
We assume the following four actions per each user and game and within the first 3 days from
installation:
3. Middle-success (M)
≥3, and not (S) nor (L)
4. Failure (F)
≤2, 0, 0
1. Success (S)
≥5, ≥2, ≥2
2. Middle-success & Leaving (L)
≥3 and ≤5, 0, 0 and the user did not connect within 2 days
after the last access
● installations,
● accesses,
● artifacts bought,
● connect,
● and shares;
Let’s come back
46
Product Roll-up Tracking
● installations,
● accesses,
● artifacts bought,
● connect,
● and shares;
≥5, ≥2, ≥2
S
≥3 ≤5, 0, 0
No con. 2 days
after the last
access
L
≥3,
not(S) nor (L)
M
≤2, 0, 0
F
time
Context
installed installed + 3 days || ac::6
≤2
≥3
Rule Grammar
Product Roll-up Tracking Rule
47
Rule Grammar
Product Roll-up Tracking Rule
48
Context
Sub-Context
Rule Grammar
Product Roll-up Tracking Rule
49
Context
Sub-Context
time
Context
_in _in + 3 days || _ac::6
_ac::1_ac::2
Sub-context
_ac::3
Rule Grammar
Product Roll-up Tracking Rule
50
Rule Grammar
Product Roll-up Tracking Rule
51
Rule Grammar
Product Roll-up Tracking Rule
52
≥5, ≥2, ≥2
S
Rule Grammar
Product Roll-up Tracking Rule
53
≥3 ≤5, 0, 0
No con. 2 days
after the last
access
L
Rule Grammar
Product Roll-up Tracking Rule
54
≥3,
not(S) nor (L)
M
CEP: THE EXECUTION PLAN
LOGICAL EXECUTION PLAN
56
Why Petri Nets?
Concurrency &
Synchronization
Places, Transitions,
Edges and Tokens
Probabilistic CEP
LOGICAL EXECUTION PLAN
A Petri net consists of places, transitions, and arcs. Arcs run from a place to a transition or vice versa, never between places or
between transitions.
57
Petri Net
https://en.wikipedia.org/wiki/Petri_net
Coloured Petri nets allow tokens to have a data value attached to them. This attached data value is called the token color.
Age can be used to invalidate token over time
LOGICAL EXECUTION PLAN
N = (Σ, P, I, IC, OC, TT, 𝛑, IT, G, r0
)
Σ: is a finite set of types (colours),Σ ⊆ E[n]
, n ∈ N;
P≣ [p1
, p2
,..., p|P|
]: is a finite set of places, which can be either stateless, i.e. they pass tokens between transitions, or stateful,
i.e. they preserve tokens in ordered structures;
I: is a finite set of transitions . Transitions are either temporal guards, consumers or intermediate transitions;
IC ⊆ (P x I): is a finite non-empty set of input arcs;
OC ⊆ (I x P): is a finite non-empty set of output arcs;
TT: P ⇒ Σ: is a color function, where each place has a single type that belongs to Σ, and all the tokens on this place must be
of the same type;
𝛑: IC ⇒ NO
is a priority function;
IT: I ⇒ R is a time expression function;
G: I ⇒ boolean is a guard function that maps each transition i ∈ I to a boolean expression over all the incoming arcs IC(i) ⊆
IC;
r0
∈ R is an initial marking from the set of all markings R.
58
Aging tokens Prioritized Colored Petri Net Definition
LOGICAL EXECUTION PLAN
59
LEAD Rules in APCPN
LEAD Rule in APCPN
LOGICAL EXECUTION PLAN
60
LEAD Rules in APCPN
LEAD Rule in APCPN
1 type of token
Every place has 1 type
(union of input token
type)
LOGICAL EXECUTION PLAN
61
LEAD Rules in APCPN
LEAD Rule in APCPN
Source Pattern Compact version
Priorities to transitions
Within Operator:
A within 10s from B
(B matched before A)
Sequence Operator:
A followed by B
LOGICAL EXECUTION PLAN
62
LEAD Rules in APCPN
Two forms of sequencing events
Within Operator:
A within 10s from B
(B matched before A)
Sequence Operator:
A followed by B
LOGICAL EXECUTION PLAN
63
LEAD Rules in APCPN
Two forms of sequencing events
past (match), stateful
match Now, stateless
Within Operator:
A within 10s from B
(B matched before A)
Sequence Operator:
A followed by B
LOGICAL EXECUTION PLAN
64
LEAD Rules in APCPN
Two forms of sequencing events
Aging tokens for
temporal constraints
LOGICAL EXECUTION PLAN
65
Product Roll-up Tracking APCPN
CEP: STREAMING JOB
The impl. overview
67
Grammar 2 ACPN
compiler
ACPN 2 Flink
Streaming compiler
EMF object tree
Streaming job
68
Keyed Stream
● Key preserving stream (avoiding re-shuffling)
● Kind of hard coded optimization: to ensure that Flink does not reshuffle keys between
operators (already on the task manager)
Stream functions
● Enable
● And
● OR
● ...
The basic foundation: the keyed stream processors & function library
The activation challenge
69
How to feedback the activation on a Stream based on a current event?
Sequence Operator:
A followed by B: The activation of
B starts only when A is received.
A
B
OP OP
OP
You can start processing B
OP
LEAD Stream Operator
The activation challenge
70
How to feedback the activation on a Stream based on a current event?
A
B
OP OP
OP
You can start processing B
Stream Function 2
Each transition in the ACPN becomes a function
● The functions are stored in the state key store
● They can be activated/deactivated by listening control events coming from the “Context”
OP
LEAD Stream Operator
Modeling ACPN choice
71
Grouping the functions only when a transition faces a choice
Stream Function 1
H
L
LL
Petri net prioritized choice
LEAD Stream Operator
Stream Function 1
Stream Function 2
Stream Function 3
LEAD Stream Operator
Stream Function 2
A
B
A
B
72
The context - the control plan
An iterative Stream simulating a pub sub
Context
LEAD Stream Operator
Stream Function 1
LEAD Stream Operator
Stream Function 2
LEAD Stream Operator
Stream Function 3
JOIN process broadcast
Streaming job
73
What logic do functions execute
Sequence Operator:
A followed by B: The activation of
B starts only when A is received.
A
B
Source Operator
A Source Function
1- Get an A event
2- Enable “B Source Function”
Source Operator
Get Control Events
Enable “B Source Function”
B Source Function
1- Get a B event
Sequence Operator
Sequence Function
1- Get an event
2- if (event == A)
2.1- Store in State
if (event == B and !State.isEmpty)
2.1- Produce a sequence event
A --→ B
Garbage Collector Function
1- Discard a collected event B
Data Flow
Control Flow
Challenges to tackle
74
1. Control Stream: How to ensure that the control stream is faster than data stream and
to avoid race conditions ?
2. Matching per partition: is there any case where we need to match cross partitions ?
3. Finishing the end to end code:
a. Petri 2 Flink compiler
b. End to end functional and performance tests on distributed env.
Summary
75
● Both technical and logical challenges were the reasons behind LEAD;
● 18 operators were introduced and formalized using TRIO trying to eliminate ambiguous behaviours;
● The decent set of operators and extending the capabilities of the query language were meant to increase
the expressive power in CEP;
● Aging tokens prioritized colored petri nets, as a logical execution gives a room for optimisations;
Future Work
● Query optimizations on both logical and physical levels
● Benchmarking the performance of LEAD CEP
● Probabilistic CEP
Open Sourcing Dev
76
Preparing a Flink integration proposal and a merge request
Thinking about integration the Stream SQL API as Match recognise
Why not a KStream implementation ?
Contact:
sabri.skhiri@euranova.eu
Tel: +32 483 66 37 90
euranova.eu
research.euranova.eu
digazu.com
78
● A peer-reviewed open access journal
on data in science, with the aim of
enhancing data transparency and
reusability.
● Publishes in two sections:
○ A section on the collection,
treatment and analysis methods
of data in science;
○ A section publishing descriptions
of scientific and scholarly datasets
(one dataset per paper).
Open Access; Included in Scopus, Emerging
Sources Citation Index (ESCI - Web of
Science), DBLP and Inspec
Over 500 authors, reviewers and editors involved
in 2018; Free language check service after paper
accepted
Peer-reviewed, 16.7 days for a first decision;
2.8 days from acceptance to publication
(median processing time in first half 2019)
No space constraints, no extra
space or color charges
data@mdpi.com
www.mdpi.com/journal/data

Lead confluent HQ Dec 2019

  • 1.
    Anas Al Bassit,R&D@EURANOVA Sabri Skhiri, R&D@EURANOVA 1 LEAD: A Formal Specification For Event Processing
  • 2.
    EURA NOVA: ADATA COMPANY
  • 3.
    We want tohave a positive impact on the world. We believe technology is the engine of change. To embrace the change and its possibilities, we believe that we have to stay at the edge of the knowledge. More : we believe creating new knowledge is the best way to see further and to lead the path to tomorrow. So, the positive impact on the world is led by a sound and cutting-edge knowledge of technology. Vision Impacting the society with technologies 3
  • 4.
    We develop technologies forclients and we implement the solution they need. We commit on the results & deliveries. We bring our cross-domain knowledge through our experts to deliver more value to our clients. On request of the client, we help them define strategic opportunities. CRAFT SERVE EXPLORE We work on the most promising IT technologies to share our R&D knowledge & build dedicated training sessions delivered by renowned experts OUR MISSION 4
  • 5.
    EXPERTS IN DATA STRATEGY,DATA ENGINEERING, DATA ARCHITECTURE, DATA SCIENCE, GRAPHS 5
  • 6.
  • 7.
    HR Automotive Bank & Insurance MORE THAN100 USE CASESCustomers’ challenges Media Aviation Telco RetailPharma Cross pollinating expertise 7
  • 8.
    A PRIVATE R&DCENTER 36 SCIENTIFIC PUBLICATIONS SINCE 2008 ● 4th Workshop IEEE CA USA, on Real-time and Stream Analytics in Big Data & Stream Data Management ● Anas Al Bassit, Skhiri Sabri, LEAD: A Formal Specification For Event Processing, in 13Th ACM international Conference on distributed and event-based systems 2019 ● Katsiaryna Krasnashchok, Aymen Cherif, Coherence Regularization for Neural Topic Models. in 16th International Symposium on Neural Networks 2019 (ISNN 2019) ● Aymen Cherif, Salim Jouili, Pairwise Image Ranking with Deep Comparative Network. ESANN 2018: ES2018-200 ● Qile Zhu, Xiaolin Li, Ana Conesa, Cécile Pereira, GRAM-CNN: A deep learning approach with local context for named entity recognition in biomedical text, Bioinformatics – May 2018 ● Katsiaryna Krasnashchok, Salim Jouili, Improving Topic Quality by Promoting Named • Entities in Topic Modeling, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Vol. 2. 2018 ● Amine Ghrab, Oscar Romero, Salim Jouili, Sabri Skhiri, Graph BI & Analytics: Current State and Future Challenges. DaWaK 2018: 3-18 ● De Visscher, I.; Stempfel, G.; Rooseleer, F. & Treve, V.; Data mining and Machine Learning techniques supporting Time-Based Separation concept deployment, in 37th Digital Avionics Systems Conference (DASC), pp 594-603, London, UK, September 23-27, 2018 R&D expertise 8 8
  • 9.
  • 10.
  • 11.
    Our Research Tracks Jerichoand Asgard 2017 2019 ASGARDJERICHO 2022
  • 12.
    Our Research Tracks Jerichoand Asgard 2017 2019 3 Tracks ELI: Elastic & Large Indexing ECCO: Easy Cluster Configuration Optimization LEAD: Live Event Analysis & Detection 2022 ASGARDJERICHO
  • 13.
    JERICHO ELI ELI’s GOAL Extract relevantinformation from data lakes containing large volumes of heterogeneous data Topic Modeling Embeddings Improvements Learning to Rank Aymen Cherif, Salim Jouili, Image retrieval and ranking through Deep Comparative Neural Networks. ESANN 2018: ES2018-200 Katsiaryna Krasnashchok, Salim Jouili, Improving Topic Quality by Promoting Named Entities in Topic Modeling, Proceedings of the 56th Luca De Petris, Aymen Cherif, LSTM Siamese Network for Question Answering System, SLSP 2018 Katsiaryna Krasnashchok, Salim Jouili, Hierarchical Attention-Based Neural Topic Model, SLSP 2018 Katsiaryna Krasnashchok, Aymen Cherif, Coherence Regularization for Neural Topic Models. 16th International Symposium on Neural Networks 2019 (ISNN 2019)
  • 14.
    JERICHO ECCO ECCO’s GOAL Assist usersin the configuration & optimisation of their Big Data frameworks Select parameters for performance prediction and performance optimization Given: ● the infrastructure constraints ● a specific application (job) ● a specific input dataset. We build an optimiser that searches for the best parameters Muaz Twaty, Amine Ghrab, Sabri Skhiri, GraphOpt: a Framework for Automatic Parameters Tuning of Graph Processing Frameworks EEE International Workshop on Benchmarking, Performance Tuning and Optimization for Big Data Applications (BPOD 2019)
  • 15.
    JERICHO LEAD LEAD’s GOAL Allow usersto benefit from stream analytics in a large variety of domains Stream 1 Stream 2 Join Context-aware application Stream 3 Followed by Detects patterns Anas Al Bassit, Skhiri Sabri, LEAD: A Formal Specification For Event Processing, 13Th ACM international Conference on distributed and event-based systems 2019
  • 16.
    Our Research Tracks Jerichoand Asgard 2017 2019 2022 ASGARD MJOLNIR RUNE YGGDRASIL VAGGDELMIR JERICHO 4 Tracks
  • 17.
  • 18.
    ASGARD program ASGARD Wall- An absolute barrier Introduction
  • 19.
    ASGARD program AI &ML in industries THE NEW WALL CONNECTDATA Business needs
  • 20.
    ASGARD program AI &ML in industries THE NEW WALL CONNECTDATA Business needs
  • 21.
    THE COST OF COMPLEXITY 40% ofbusiness initiatives do not achieve their objectives because of the quality/governance of the data The minimum cost of an AI/Big Data use case 500,000 - 1 Mi 700,000 positions are waiting to be filled + GDPR compliance costs $1 to $5 MILLIONS ASGARD What kind of barrier ? 70% Of business leaders are worried about the lack of data scientists in the next 2Y
  • 22.
  • 23.
    Introduction 23 This presentation isbased on the paper published in DEBS
  • 24.
    Introduction 24 What is ComplexEvent Processing? Systems that are able to detect interesting situations by correlating events from different streams, transforming and aggregating them, and then generating actions are referred to as CEP engines
  • 25.
    Introduction 25 What is ComplexEvent Processing? “A Manchester fan is moving far from home to see a match” “Fan” Fact: location & context Localised more than x times in the last 3 mo at Manchester events Fan Moving: Now, there is a Manchester Match at Chelsea & a Manchester fan is located there
  • 26.
    Introduction 26 What is ComplexEvent Processing? “A goal is scored by Manchester”
  • 27.
    Introduction 27 What is ComplexEvent Processing? “Fans will need a last-minute hotel” ACTION: Send ads from the hotel nearby
  • 28.
    Introduction 28 Applications CEP Fraud detection Security Information andEvent Management Quality of Experience Business Activity Monitoring Event-driven marketing Multimodal Surveillance
  • 29.
    CEP Challenges 29 Technical ● Performance(Throughput& Latency) ● Scalability ● State management
  • 30.
    CEP Challenges Technical ● Performance(Throughput& Latency) ● Scalability ● State management 30 Logical ● Ambiguous Semantics (Absence of formalisms and Selection & Consumption policies) ● Lack of Expressiveness and User-friendliness ● Missing operators (Negations, Sequences, Repetitions … etc)
  • 31.
    Motivation 31 Product Roll-up Tracking Fourstreams of events: ● installations, ● accesses, ● artifacts bought, ● connect, ● and shares; How to identify good customers? How to measure the success of the application in real time?
  • 32.
    Motivation We assume thefollowing four actions per each user and game and within the first 3 days from installation: 32 Product Roll-up Tracking 3. Middle-success (M) ≥3, and not (S) nor (L) 4. Failure (F) ≤2, 0, 0 1. Success (S) ≥5, ≥2, ≥2 2. Middle-success & Leaving (L) ≥3 and ≤5, 0, 0 and the user did not connect within 2 days after the last access ● installations, ● accesses, ● artifacts bought, ● connect, ● and shares;
  • 33.
    Motivation 33 Product Roll-up Tracking Thereis no CEP framework capable of formulating this problem with less than four queries, although the patterns are similar to each other and have inter-dependencies. We assume the following four actions per each user and game and within the first 3 days from installation: 3. Middle-success (M) ≥3, and not (S) nor (L) 4. Failure (F) ≤2, 0, 0 1. Success (S) ≥5, ≥2, ≥2 2. Middle-success & Leaving (L) ≥3 and ≤5, 0, 0 and the user did not connect within 2 days after the last access ● installations, ● accesses, ● artifacts bought, ● connect, ● and shares;
  • 34.
    Contributions 34 A pattern algebrathat extends the common set of operators in CEP, and defines them formally using TRIO [1, 2], a logic-based specification language aggrandized with temporal features A rule grammar that, using our pattern algebra, allows users to obtain different kinds of actions, depending on the characteristics of a matched pattern A novel logical execution plan created based on a combination of timed colored petri nets with aging tokens [3] and prioritized petri nets [4], that we believe will facilitate the deployment of this plan in the future. 1 2 3
  • 35.
  • 36.
  • 37.
    Pattern Model Access (GID:123, UID: 321): 999 37 Event Representation & Formal Definitions Event Type Attributes Values Event Time Sequence Operator Repetition Operator
  • 38.
    Pattern Model 38 Basic Operators: ●Renaming ● Filtering LEAD Operators Core Operators: ● Conjunction ● Disjunction ● Negation ● Sequence ● Repetition ● Subcontext Temporal Constraints ● Within ● Wait Selection & Consumption Policies: ● First ● Last ● Adjacent ● Every ● All ● All … Consume ● Repetition Max ● Repetition Min
  • 39.
    Pattern Model 39 Context andSub-context time Context installed installed + 3 days || ac::6 ac::1 ac::2 Middle-success & Leaving (L) ● 3≤ accesses ≤5 ● The user did not connect within 2 days after the last access
  • 40.
    Pattern Model 40 Context andSub-context ac::3 time Context Sub-context installed ac::1 ac::2 ac::3 + 2 days installed + 3 days || ac::6 Middle-success & Leaving (L) ● 3≤ accesses ≤5 ● The user did not connect within 2 days after the last access
  • 41.
    Pattern Model 41 Context andSub-context ac::3 time Context Sub-context installed ac::1 ac::2 ac::4 + 2 daysac::4 installed + 3 days || ac::6 Middle-success & Leaving (L) ● 3≤ accesses ≤5 ● The user did not connect within 2 days after the last access
  • 42.
  • 43.
  • 44.
  • 45.
    Let’s come back 45 ProductRoll-up Tracking There is no CEP framework capable of formulating this problem with less than four queries, although the patterns are similar to each other and have inter-dependencies. We assume the following four actions per each user and game and within the first 3 days from installation: 3. Middle-success (M) ≥3, and not (S) nor (L) 4. Failure (F) ≤2, 0, 0 1. Success (S) ≥5, ≥2, ≥2 2. Middle-success & Leaving (L) ≥3 and ≤5, 0, 0 and the user did not connect within 2 days after the last access ● installations, ● accesses, ● artifacts bought, ● connect, ● and shares;
  • 46.
    Let’s come back 46 ProductRoll-up Tracking ● installations, ● accesses, ● artifacts bought, ● connect, ● and shares; ≥5, ≥2, ≥2 S ≥3 ≤5, 0, 0 No con. 2 days after the last access L ≥3, not(S) nor (L) M ≤2, 0, 0 F time Context installed installed + 3 days || ac::6 ≤2 ≥3
  • 47.
  • 48.
    Rule Grammar Product Roll-upTracking Rule 48 Context Sub-Context
  • 49.
    Rule Grammar Product Roll-upTracking Rule 49 Context Sub-Context time Context _in _in + 3 days || _ac::6 _ac::1_ac::2 Sub-context _ac::3
  • 50.
  • 51.
  • 52.
    Rule Grammar Product Roll-upTracking Rule 52 ≥5, ≥2, ≥2 S
  • 53.
    Rule Grammar Product Roll-upTracking Rule 53 ≥3 ≤5, 0, 0 No con. 2 days after the last access L
  • 54.
    Rule Grammar Product Roll-upTracking Rule 54 ≥3, not(S) nor (L) M
  • 55.
  • 56.
    LOGICAL EXECUTION PLAN 56 WhyPetri Nets? Concurrency & Synchronization Places, Transitions, Edges and Tokens Probabilistic CEP
  • 57.
    LOGICAL EXECUTION PLAN APetri net consists of places, transitions, and arcs. Arcs run from a place to a transition or vice versa, never between places or between transitions. 57 Petri Net https://en.wikipedia.org/wiki/Petri_net Coloured Petri nets allow tokens to have a data value attached to them. This attached data value is called the token color. Age can be used to invalidate token over time
  • 58.
    LOGICAL EXECUTION PLAN N= (Σ, P, I, IC, OC, TT, 𝛑, IT, G, r0 ) Σ: is a finite set of types (colours),Σ ⊆ E[n] , n ∈ N; P≣ [p1 , p2 ,..., p|P| ]: is a finite set of places, which can be either stateless, i.e. they pass tokens between transitions, or stateful, i.e. they preserve tokens in ordered structures; I: is a finite set of transitions . Transitions are either temporal guards, consumers or intermediate transitions; IC ⊆ (P x I): is a finite non-empty set of input arcs; OC ⊆ (I x P): is a finite non-empty set of output arcs; TT: P ⇒ Σ: is a color function, where each place has a single type that belongs to Σ, and all the tokens on this place must be of the same type; 𝛑: IC ⇒ NO is a priority function; IT: I ⇒ R is a time expression function; G: I ⇒ boolean is a guard function that maps each transition i ∈ I to a boolean expression over all the incoming arcs IC(i) ⊆ IC; r0 ∈ R is an initial marking from the set of all markings R. 58 Aging tokens Prioritized Colored Petri Net Definition
  • 59.
    LOGICAL EXECUTION PLAN 59 LEADRules in APCPN LEAD Rule in APCPN
  • 60.
    LOGICAL EXECUTION PLAN 60 LEADRules in APCPN LEAD Rule in APCPN 1 type of token Every place has 1 type (union of input token type)
  • 61.
    LOGICAL EXECUTION PLAN 61 LEADRules in APCPN LEAD Rule in APCPN Source Pattern Compact version Priorities to transitions
  • 62.
    Within Operator: A within10s from B (B matched before A) Sequence Operator: A followed by B LOGICAL EXECUTION PLAN 62 LEAD Rules in APCPN Two forms of sequencing events
  • 63.
    Within Operator: A within10s from B (B matched before A) Sequence Operator: A followed by B LOGICAL EXECUTION PLAN 63 LEAD Rules in APCPN Two forms of sequencing events past (match), stateful match Now, stateless
  • 64.
    Within Operator: A within10s from B (B matched before A) Sequence Operator: A followed by B LOGICAL EXECUTION PLAN 64 LEAD Rules in APCPN Two forms of sequencing events Aging tokens for temporal constraints
  • 65.
    LOGICAL EXECUTION PLAN 65 ProductRoll-up Tracking APCPN
  • 66.
  • 67.
    The impl. overview 67 Grammar2 ACPN compiler ACPN 2 Flink Streaming compiler EMF object tree
  • 68.
    Streaming job 68 Keyed Stream ●Key preserving stream (avoiding re-shuffling) ● Kind of hard coded optimization: to ensure that Flink does not reshuffle keys between operators (already on the task manager) Stream functions ● Enable ● And ● OR ● ... The basic foundation: the keyed stream processors & function library
  • 69.
    The activation challenge 69 Howto feedback the activation on a Stream based on a current event? Sequence Operator: A followed by B: The activation of B starts only when A is received. A B OP OP OP You can start processing B OP
  • 70.
    LEAD Stream Operator Theactivation challenge 70 How to feedback the activation on a Stream based on a current event? A B OP OP OP You can start processing B Stream Function 2 Each transition in the ACPN becomes a function ● The functions are stored in the state key store ● They can be activated/deactivated by listening control events coming from the “Context” OP
  • 71.
    LEAD Stream Operator ModelingACPN choice 71 Grouping the functions only when a transition faces a choice Stream Function 1 H L LL Petri net prioritized choice LEAD Stream Operator Stream Function 1 Stream Function 2 Stream Function 3 LEAD Stream Operator Stream Function 2 A B A B
  • 72.
    72 The context -the control plan An iterative Stream simulating a pub sub Context LEAD Stream Operator Stream Function 1 LEAD Stream Operator Stream Function 2 LEAD Stream Operator Stream Function 3 JOIN process broadcast
  • 73.
    Streaming job 73 What logicdo functions execute Sequence Operator: A followed by B: The activation of B starts only when A is received. A B Source Operator A Source Function 1- Get an A event 2- Enable “B Source Function” Source Operator Get Control Events Enable “B Source Function” B Source Function 1- Get a B event Sequence Operator Sequence Function 1- Get an event 2- if (event == A) 2.1- Store in State if (event == B and !State.isEmpty) 2.1- Produce a sequence event A --→ B Garbage Collector Function 1- Discard a collected event B Data Flow Control Flow
  • 74.
    Challenges to tackle 74 1.Control Stream: How to ensure that the control stream is faster than data stream and to avoid race conditions ? 2. Matching per partition: is there any case where we need to match cross partitions ? 3. Finishing the end to end code: a. Petri 2 Flink compiler b. End to end functional and performance tests on distributed env.
  • 75.
    Summary 75 ● Both technicaland logical challenges were the reasons behind LEAD; ● 18 operators were introduced and formalized using TRIO trying to eliminate ambiguous behaviours; ● The decent set of operators and extending the capabilities of the query language were meant to increase the expressive power in CEP; ● Aging tokens prioritized colored petri nets, as a logical execution gives a room for optimisations; Future Work ● Query optimizations on both logical and physical levels ● Benchmarking the performance of LEAD CEP ● Probabilistic CEP
  • 76.
    Open Sourcing Dev 76 Preparinga Flink integration proposal and a merge request Thinking about integration the Stream SQL API as Match recognise Why not a KStream implementation ?
  • 77.
    Contact: sabri.skhiri@euranova.eu Tel: +32 48366 37 90 euranova.eu research.euranova.eu digazu.com
  • 78.
  • 79.
    ● A peer-reviewedopen access journal on data in science, with the aim of enhancing data transparency and reusability. ● Publishes in two sections: ○ A section on the collection, treatment and analysis methods of data in science; ○ A section publishing descriptions of scientific and scholarly datasets (one dataset per paper). Open Access; Included in Scopus, Emerging Sources Citation Index (ESCI - Web of Science), DBLP and Inspec Over 500 authors, reviewers and editors involved in 2018; Free language check service after paper accepted Peer-reviewed, 16.7 days for a first decision; 2.8 days from acceptance to publication (median processing time in first half 2019) No space constraints, no extra space or color charges data@mdpi.com www.mdpi.com/journal/data