SlideShare a Scribd company logo
SCALING PATTERN AND SEQUENCE
QUERIES IN COMPLEX EVENT PROCESSING
V. Mohanadarshan
148241N
Supervisors : Dr. Srinath Perera
Dr. Dilum Bandara
June 2nd, 2017
Research Contribution
● Goal
Propose an approach to scale pattern and sequence detection in Complex Event
Processing (CEP) to enable high event rate.
● Importance
Existing approaches only solve specific subset of pattern and sequence detection
related scalability problems.
● Approach
Time-based event partitioning to scale pattern and sequence detection.
● Results
800% improvement in throughput and reduced re-ordering, slight increase in latency
2
Outline
● Real-time Analytics
● Need for Scaling
● Literature Review
● Methodology
○ Partition Events by Time
○ Handling Event Duplication
○ Event Reordering
● Performance Analysis
● Conclusions
● Future Work
3
Real-time Analytics
● Processing (listening to events and detecting
patterns) Data on the fly, while storing
minimal amount of information and
responding fast (from <1 ms to few seconds).
● Idea of Event streams, a series of events in
time.
● Enabling technologies
○ Stream Processing (Storm)
○ Complex Event Processing
4
Complex Event Processing
5Source : Mark Simms, Microsoft Streaminsight (http://www.slideshare.net/markginnebaugh/microsoft-streaminsight)
How CEP Works?
6
Pattern and Sequence Detection
● Pattern and sequence detection is the crown-jewel of CEP.
● Addresses a sequence of events that occur in order and are
correlated based on values of their attributes.
● Event patterns are implemented using a specialized state machine
approach.
7
from every (a1 = transactionStream [a1.amountWithdrawed < 100]
→ a2 = transactionStream [(a1.toAccountNo == a2.fromAccountNo) and (amountWithdrawed > 10000)]
within 5 min
select a1.fromAccountNo as suspectAccountNo
insert into possibleMoneyLaunderingActivityStream;
Important Features in CEP
● High Availability
● Scalability
● Distributed Processing
● Visual Composition
● Performance
● Debugger
8
Need for Scaling
● Scaling - Ability for a CEP system to handle larger or complex queries by adding
more resources
● Mostly CEP engines run in a large box, scaling up horizontally.
Scaling CEP has several dimensions:
1. Handling Large no of queries
2. Queries that needs large working memory
3. Handling a complex query that might not fit within a single machine
4. Handling large number of events
9
● S. Perera, How to scale Complex Event Processing (CEP) Systems? [online]. Available:
http://srinathsview.blogspot.com/2012/05/how-to-scale-complex-event-processing.html. [Dec. 23, 2014].
How to provide large-scale pattern and
sequence detection in CEP while supporting
high event rates?
EXISTING APPROACHES
Common Types of Scaling
12
Scaling
Vertical Scaling Horizontal Scaling
Partition Based Scaling
13
● R. Mayer, B. Koldehofe, and K. Rothermel, “Meeting Predictable Buffer Limits in the Parallel Execution of Event Processing Operators,” In Proc. IEEE BigData ‟04,
Washington, USA, Oct 2014, pp. 402–411.
● S. Perera, How to scale Complex Event Processing (CEP) Systems? [online]. Available: http://srinathsview.blogspot.com/2012/05/how-to-scale-complex-
event-processing.html.
Publisher-Subscriber Based Scaling
14
● V. Govindasamy and Prof. Dr. P. Thambidura, “An Efficient and Generic Filtering Approach for Uncertain Complex Event Processing,” In Proc International
Conference on Data Mining and Computer Engineering, Thailand, Bangkok, Dec 2012, pp. 211-216.
Storm-Based Scaling
● T. Dudziak, Storm & Esper [online], Available: https://tomdzk.wordpress.com/2011/09/28/storm-esper/. [Jan. 06, 2015].
● S. Ravindra, WSO2 CEP 4.0.0 in Distributed Mode [online]. Available: http://sajithr.blogspot.com/2015/09/wso2-cep-400-in-distributed-mode.html. [Feb. 23,
2017]. 15
Distributed Object Cache Based Scaling
● Magmasystems Blog, CEP Engines and Object Caches [online]. Available:
http://magmasystems.blogspot.com/2008/02/cep-engines-and-object-caches.html. [Dec. 23, 2014]. 16
Scaling by Integrating with ESB
● The key architectural insight in
the system is to separate the
integration functionalities of
the ESB and the complex event
facilities.
● Stateless ESB, which can be
scaled out by adding more
processing nodes.
● CEP cluster can then be tuned
to handle high throughput and
scaled out separately.
● A. Aalto, “Scalability of Complex Event Processing as a part of a distributed Enterprise Service Bus,” Ph.D. dissertation, Dept. Science., Aalto University, Espoo, 2012.17
METHODOLOGY
Key Stages of the Solution
● Incoming events are partitioned based on 'within' value defined in the query.
● The pattern is detected within a partition
● Remove duplicated events
● Reorder events based on timestamp.
19
Partition Events by Time
20
from every h1 = hitStream -> h2 = hitStream[h1.pid != pid and h1.tid == tid] -> h3 = hitStream[h1.pid == pid]
within 5 seconds
select h1.pid as player1, h2.pid as player2, h3.pid as player3, h1.tsr as tStamp1 , h2.tsr as tStamp2 , h3.tsr as
tStamp3
insert into patternMatchedStream;
Here we are looking for following 3 states,
1. Ball hit from a player x of team 1
2. Then, a ball hit from another player y of opponent team 2
3. Finally, a ball hit from the same player x who hit first.
Moreover, these 3 states needs to happen within 5 seconds.
Partition Events by Time - Overview
● Incoming events are get queued at the entry to the CEP engine.
● Then events in the queue are partitioned based on time values.
● Then each partitioned event group is pushed to one of the parallelly running CEP instances.
21
Partition Events by Time (contd..)
22
Event Reordering and Duplication Handling
23
define stream patternMatchedStream
(player1 string, player2 string, player3 string,
tStamp long, tStamp1 long, tStamp2 long); ");
From patternMatchedStream#window.kslack(10000)
select *
insert into filteredOutputStream;
Event Reordering
K-slack based Event Reordering
● K-slack transparently buffers and reorders events before they are processed by event
detectors.
● Buffering and sorting delays the processing of the input events by the query operator, thus
increases the latency of the query results.
● It dynamically adjusts the buffer size to a big-enough value to accommodate all late arrivals,
aiming to provide near exact query results
24
● M. Li, M. Liu, L. Ding, E. A. Rundensteiner and M. Mani, “Event Stream Processing with Out-of-Order Data Arrival,” In Proc. 27th International Conference on
Distributed Computing Systems Workshops, Toronto, Canada, Jun 2007, pp. 67.
Event Duplication Handling
● Event duplication can be handled using a HashSet-based data structure.
● HashSet creates a collection that uses a hash table for storage. A hash
table stores information by using a mechanism called hashing.
● We wrote hash function of the event which returns the hash code by
considering the attributes of event.
● Hash code is then used as the index at which the data associated with
the key is stored
25Figure Source : http://computersecuritypsh.wikia.com/wiki/Hash_Function
Implementation - Architecture
26
EVALUATION
Benchmark
Soccer monitoring benchmark is based on the DEBS (Distributed Event Based Systems) 2013
Grand Challenge
28
● Data used for this benchmark is collected
by the real-time locating system deployed
on a football field in Germany.
● Totally 47 Millions of events.
● Average event size is 365 bytes.
● Every event describes a position of a
given sensor in a 3D coordinate system.
● DEBS Org, DEBS 2013 Grand Challenge: Soccer monitoring [online]. Available: http://debs.org/?p=41. [Jan. 05th, 2017]
Evaluation Setup
● Implemented a POC setup to evaluate Siddhi CEP engine and our implementation*
● Tests were conducted with Oracle JDK 1.7.0_79-b15
● Hardware Configuration,
29
Property Value
Cores 32 and 16
Memory Min- 16GB and Max- 18GB
CPU IntelR
XeonR
core E5-2470, 2.30 GHz
Cache L3: 20MB
* https://github.com/mohanvive/siddhi-2.x
Evaluation - Throughput
Throughput improved by 800% in the proposed solution when Siddhi instance count is 20.
30
Throughput of the default Siddhi CEP engine Throughput in multi-core machines of the proposed solution
Evaluation - Throughput
31
Throughput vs. within time Interval (32 Core Machine)
Evaluation - Resource Utilization
32
CPU usage in default WSO2 Siddhi engine when processing CPU usage in the proposed solution
* In 32 Core Machine, with 20 Siddhi Instances - 4 second partition time
Evaluation - Resource Utilization
33
Thread count in default WSO2 Siddhi engine when processing Thread count in the proposed solution
* In 32 Core Machine, with 20 Siddhi Instances - 4 second partition time
Evaluation - Accuracy
34
Duplicated events (in %) vs. Siddhi instance count Disordered events (in %) vs. Siddhi instance count
13% - 20% of events got duplicated and 3% - 11% of events get disordered compared to
patterns detected by the default Siddhi CEP engine.
Evaluation - Latency
35
Latency in default WSO2 Siddhi CEP engine Latency in the proposed solution
Per event latency increased from 2-3 milliseconds to 8-20 milliseconds (Siddhi instance count is 20)
SUMMARY
Summary
● Proposed time-based partition approach to scale pattern and sequence CEP queries.
● A scaling approach which is independent of internal implementation of a CEP engine.
● Proposed an approach to overcome event duplication and event reordering that arise
due to the use of multiple CEP engines.
● Achieved 800% improvement in throughput.
● Provides 100% accuracy for the use cases which expecting ‘atleast-one’ QOS.
● Evaluated and verified the effectiveness of the solution by looking at various attributes
(Within Time, No of Siddhi instances and etc..)
● Can be used to scale other CEP queries which can be partitioned by time.
37
Limitations
● Our proposed solution would not be an ideal approach for Pattern and Sequence
queries which has large ‘within’ time.
● Due to buffering and partition nature of the solution, pattern detection can be
duplicated and output might contain duplicated events. Not suits well for cases
which required ‘exactly one’ QOS scenarios
● No of Siddhi instance count is an user configuration value.
● Due to the parallelism while processing, pattern detected events can get
reordered.
38
Future Work
● Self tuning no of Siddhi instance count based on hardware resource consumption and other
factors like throughput and latency.
● Exploring the possibility to scale pattern queries which has longer ‘within’ time.
● Implement proposed approach in a distributed environment and verify effectiveness.
● Explore other options to remove event duplication and reorder events.
39
THANK YOU
wso2.com
QUESTIONS?

More Related Content

Similar to Scaling Pattern and Sequence Queries in Complex Event Processing

MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series Data
MongoDB
 
Big Data Lessons from the Cloud
Big Data Lessons from the CloudBig Data Lessons from the Cloud
Big Data Lessons from the Cloud
MapR Technologies
 
The application of process mining in a simulated smart environment to derive ...
The application of process mining in a simulated smart environment to derive ...The application of process mining in a simulated smart environment to derive ...
The application of process mining in a simulated smart environment to derive ...
freedomotic
 
IRJET- Secure Distributed Data Mining
IRJET- Secure Distributed Data MiningIRJET- Secure Distributed Data Mining
IRJET- Secure Distributed Data Mining
IRJET Journal
 
Implementation of Banker’s Algorithm Using Dynamic Modified Approach
Implementation of Banker’s Algorithm Using Dynamic Modified ApproachImplementation of Banker’s Algorithm Using Dynamic Modified Approach
Implementation of Banker’s Algorithm Using Dynamic Modified Approach
rahulmonikasharma
 
Implementation of Banker’s Algorithm Using Dynamic Modified Approach
Implementation of Banker’s Algorithm Using Dynamic Modified ApproachImplementation of Banker’s Algorithm Using Dynamic Modified Approach
Implementation of Banker’s Algorithm Using Dynamic Modified Approach
rahulmonikasharma
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your Enterprise
WSO2
 
Event Processing Using Semantic Web Technologies
Event Processing Using Semantic Web TechnologiesEvent Processing Using Semantic Web Technologies
Event Processing Using Semantic Web Technologies
Mikko Rinne
 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB
 
Semantic-based Process Analysis
Semantic-based Process AnalysisSemantic-based Process Analysis
Semantic-based Process Analysis
Mauro Dragoni
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB
 
The influence of data size on a high-performance computing memetic algorithm ...
The influence of data size on a high-performance computing memetic algorithm ...The influence of data size on a high-performance computing memetic algorithm ...
The influence of data size on a high-performance computing memetic algorithm ...
journalBEEI
 
PIDS research slides from MALCON 2018 conference - Asaf Hecht
PIDS research slides from MALCON 2018 conference - Asaf HechtPIDS research slides from MALCON 2018 conference - Asaf Hecht
PIDS research slides from MALCON 2018 conference - Asaf Hecht
Asaf Hecht
 
Data-Driven Analysis of Batch Processing Inefficiencies in Business Processes
Data-Driven Analysis of  Batch Processing Inefficiencies  in Business ProcessesData-Driven Analysis of  Batch Processing Inefficiencies  in Business Processes
Data-Driven Analysis of Batch Processing Inefficiencies in Business Processes
Marlon Dumas
 
Nexmark with beam
Nexmark with beamNexmark with beam
Nexmark with beam
Etienne Chauchot
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Andrii Gakhov
 
Digital Document Preservation Simulation - Boston Python User's Group
Digital Document  Preservation Simulation - Boston Python User's GroupDigital Document  Preservation Simulation - Boston Python User's Group
Digital Document Preservation Simulation - Boston Python User's Group
Micah Altman
 
SKG-2013, Beijing, China, 03 October 2013
SKG-2013, Beijing, China, 03 October 2013SKG-2013, Beijing, China, 03 October 2013
SKG-2013, Beijing, China, 03 October 2013
Charith Perera
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
DataWorks Summit
 

Similar to Scaling Pattern and Sequence Queries in Complex Event Processing (20)

MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series Data
 
Big Data Lessons from the Cloud
Big Data Lessons from the CloudBig Data Lessons from the Cloud
Big Data Lessons from the Cloud
 
The application of process mining in a simulated smart environment to derive ...
The application of process mining in a simulated smart environment to derive ...The application of process mining in a simulated smart environment to derive ...
The application of process mining in a simulated smart environment to derive ...
 
IRJET- Secure Distributed Data Mining
IRJET- Secure Distributed Data MiningIRJET- Secure Distributed Data Mining
IRJET- Secure Distributed Data Mining
 
Implementation of Banker’s Algorithm Using Dynamic Modified Approach
Implementation of Banker’s Algorithm Using Dynamic Modified ApproachImplementation of Banker’s Algorithm Using Dynamic Modified Approach
Implementation of Banker’s Algorithm Using Dynamic Modified Approach
 
Implementation of Banker’s Algorithm Using Dynamic Modified Approach
Implementation of Banker’s Algorithm Using Dynamic Modified ApproachImplementation of Banker’s Algorithm Using Dynamic Modified Approach
Implementation of Banker’s Algorithm Using Dynamic Modified Approach
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your Enterprise
 
Event Processing Using Semantic Web Technologies
Event Processing Using Semantic Web TechnologiesEvent Processing Using Semantic Web Technologies
Event Processing Using Semantic Web Technologies
 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema Design
 
Semantic-based Process Analysis
Semantic-based Process AnalysisSemantic-based Process Analysis
Semantic-based Process Analysis
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
 
The influence of data size on a high-performance computing memetic algorithm ...
The influence of data size on a high-performance computing memetic algorithm ...The influence of data size on a high-performance computing memetic algorithm ...
The influence of data size on a high-performance computing memetic algorithm ...
 
rerngvit_phd_seminar
rerngvit_phd_seminarrerngvit_phd_seminar
rerngvit_phd_seminar
 
PIDS research slides from MALCON 2018 conference - Asaf Hecht
PIDS research slides from MALCON 2018 conference - Asaf HechtPIDS research slides from MALCON 2018 conference - Asaf Hecht
PIDS research slides from MALCON 2018 conference - Asaf Hecht
 
Data-Driven Analysis of Batch Processing Inefficiencies in Business Processes
Data-Driven Analysis of  Batch Processing Inefficiencies  in Business ProcessesData-Driven Analysis of  Batch Processing Inefficiencies  in Business Processes
Data-Driven Analysis of Batch Processing Inefficiencies in Business Processes
 
Nexmark with beam
Nexmark with beamNexmark with beam
Nexmark with beam
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
 
Digital Document Preservation Simulation - Boston Python User's Group
Digital Document  Preservation Simulation - Boston Python User's GroupDigital Document  Preservation Simulation - Boston Python User's Group
Digital Document Preservation Simulation - Boston Python User's Group
 
SKG-2013, Beijing, China, 03 October 2013
SKG-2013, Beijing, China, 03 October 2013SKG-2013, Beijing, China, 03 October 2013
SKG-2013, Beijing, China, 03 October 2013
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 

Recently uploaded

Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 

Recently uploaded (20)

Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 

Scaling Pattern and Sequence Queries in Complex Event Processing

  • 1. SCALING PATTERN AND SEQUENCE QUERIES IN COMPLEX EVENT PROCESSING V. Mohanadarshan 148241N Supervisors : Dr. Srinath Perera Dr. Dilum Bandara June 2nd, 2017
  • 2. Research Contribution ● Goal Propose an approach to scale pattern and sequence detection in Complex Event Processing (CEP) to enable high event rate. ● Importance Existing approaches only solve specific subset of pattern and sequence detection related scalability problems. ● Approach Time-based event partitioning to scale pattern and sequence detection. ● Results 800% improvement in throughput and reduced re-ordering, slight increase in latency 2
  • 3. Outline ● Real-time Analytics ● Need for Scaling ● Literature Review ● Methodology ○ Partition Events by Time ○ Handling Event Duplication ○ Event Reordering ● Performance Analysis ● Conclusions ● Future Work 3
  • 4. Real-time Analytics ● Processing (listening to events and detecting patterns) Data on the fly, while storing minimal amount of information and responding fast (from <1 ms to few seconds). ● Idea of Event streams, a series of events in time. ● Enabling technologies ○ Stream Processing (Storm) ○ Complex Event Processing 4
  • 5. Complex Event Processing 5Source : Mark Simms, Microsoft Streaminsight (http://www.slideshare.net/markginnebaugh/microsoft-streaminsight)
  • 7. Pattern and Sequence Detection ● Pattern and sequence detection is the crown-jewel of CEP. ● Addresses a sequence of events that occur in order and are correlated based on values of their attributes. ● Event patterns are implemented using a specialized state machine approach. 7 from every (a1 = transactionStream [a1.amountWithdrawed < 100] → a2 = transactionStream [(a1.toAccountNo == a2.fromAccountNo) and (amountWithdrawed > 10000)] within 5 min select a1.fromAccountNo as suspectAccountNo insert into possibleMoneyLaunderingActivityStream;
  • 8. Important Features in CEP ● High Availability ● Scalability ● Distributed Processing ● Visual Composition ● Performance ● Debugger 8
  • 9. Need for Scaling ● Scaling - Ability for a CEP system to handle larger or complex queries by adding more resources ● Mostly CEP engines run in a large box, scaling up horizontally. Scaling CEP has several dimensions: 1. Handling Large no of queries 2. Queries that needs large working memory 3. Handling a complex query that might not fit within a single machine 4. Handling large number of events 9 ● S. Perera, How to scale Complex Event Processing (CEP) Systems? [online]. Available: http://srinathsview.blogspot.com/2012/05/how-to-scale-complex-event-processing.html. [Dec. 23, 2014].
  • 10. How to provide large-scale pattern and sequence detection in CEP while supporting high event rates?
  • 12. Common Types of Scaling 12 Scaling Vertical Scaling Horizontal Scaling
  • 13. Partition Based Scaling 13 ● R. Mayer, B. Koldehofe, and K. Rothermel, “Meeting Predictable Buffer Limits in the Parallel Execution of Event Processing Operators,” In Proc. IEEE BigData ‟04, Washington, USA, Oct 2014, pp. 402–411. ● S. Perera, How to scale Complex Event Processing (CEP) Systems? [online]. Available: http://srinathsview.blogspot.com/2012/05/how-to-scale-complex- event-processing.html.
  • 14. Publisher-Subscriber Based Scaling 14 ● V. Govindasamy and Prof. Dr. P. Thambidura, “An Efficient and Generic Filtering Approach for Uncertain Complex Event Processing,” In Proc International Conference on Data Mining and Computer Engineering, Thailand, Bangkok, Dec 2012, pp. 211-216.
  • 15. Storm-Based Scaling ● T. Dudziak, Storm & Esper [online], Available: https://tomdzk.wordpress.com/2011/09/28/storm-esper/. [Jan. 06, 2015]. ● S. Ravindra, WSO2 CEP 4.0.0 in Distributed Mode [online]. Available: http://sajithr.blogspot.com/2015/09/wso2-cep-400-in-distributed-mode.html. [Feb. 23, 2017]. 15
  • 16. Distributed Object Cache Based Scaling ● Magmasystems Blog, CEP Engines and Object Caches [online]. Available: http://magmasystems.blogspot.com/2008/02/cep-engines-and-object-caches.html. [Dec. 23, 2014]. 16
  • 17. Scaling by Integrating with ESB ● The key architectural insight in the system is to separate the integration functionalities of the ESB and the complex event facilities. ● Stateless ESB, which can be scaled out by adding more processing nodes. ● CEP cluster can then be tuned to handle high throughput and scaled out separately. ● A. Aalto, “Scalability of Complex Event Processing as a part of a distributed Enterprise Service Bus,” Ph.D. dissertation, Dept. Science., Aalto University, Espoo, 2012.17
  • 19. Key Stages of the Solution ● Incoming events are partitioned based on 'within' value defined in the query. ● The pattern is detected within a partition ● Remove duplicated events ● Reorder events based on timestamp. 19
  • 20. Partition Events by Time 20 from every h1 = hitStream -> h2 = hitStream[h1.pid != pid and h1.tid == tid] -> h3 = hitStream[h1.pid == pid] within 5 seconds select h1.pid as player1, h2.pid as player2, h3.pid as player3, h1.tsr as tStamp1 , h2.tsr as tStamp2 , h3.tsr as tStamp3 insert into patternMatchedStream; Here we are looking for following 3 states, 1. Ball hit from a player x of team 1 2. Then, a ball hit from another player y of opponent team 2 3. Finally, a ball hit from the same player x who hit first. Moreover, these 3 states needs to happen within 5 seconds.
  • 21. Partition Events by Time - Overview ● Incoming events are get queued at the entry to the CEP engine. ● Then events in the queue are partitioned based on time values. ● Then each partitioned event group is pushed to one of the parallelly running CEP instances. 21
  • 22. Partition Events by Time (contd..) 22
  • 23. Event Reordering and Duplication Handling 23 define stream patternMatchedStream (player1 string, player2 string, player3 string, tStamp long, tStamp1 long, tStamp2 long); "); From patternMatchedStream#window.kslack(10000) select * insert into filteredOutputStream;
  • 24. Event Reordering K-slack based Event Reordering ● K-slack transparently buffers and reorders events before they are processed by event detectors. ● Buffering and sorting delays the processing of the input events by the query operator, thus increases the latency of the query results. ● It dynamically adjusts the buffer size to a big-enough value to accommodate all late arrivals, aiming to provide near exact query results 24 ● M. Li, M. Liu, L. Ding, E. A. Rundensteiner and M. Mani, “Event Stream Processing with Out-of-Order Data Arrival,” In Proc. 27th International Conference on Distributed Computing Systems Workshops, Toronto, Canada, Jun 2007, pp. 67.
  • 25. Event Duplication Handling ● Event duplication can be handled using a HashSet-based data structure. ● HashSet creates a collection that uses a hash table for storage. A hash table stores information by using a mechanism called hashing. ● We wrote hash function of the event which returns the hash code by considering the attributes of event. ● Hash code is then used as the index at which the data associated with the key is stored 25Figure Source : http://computersecuritypsh.wikia.com/wiki/Hash_Function
  • 28. Benchmark Soccer monitoring benchmark is based on the DEBS (Distributed Event Based Systems) 2013 Grand Challenge 28 ● Data used for this benchmark is collected by the real-time locating system deployed on a football field in Germany. ● Totally 47 Millions of events. ● Average event size is 365 bytes. ● Every event describes a position of a given sensor in a 3D coordinate system. ● DEBS Org, DEBS 2013 Grand Challenge: Soccer monitoring [online]. Available: http://debs.org/?p=41. [Jan. 05th, 2017]
  • 29. Evaluation Setup ● Implemented a POC setup to evaluate Siddhi CEP engine and our implementation* ● Tests were conducted with Oracle JDK 1.7.0_79-b15 ● Hardware Configuration, 29 Property Value Cores 32 and 16 Memory Min- 16GB and Max- 18GB CPU IntelR XeonR core E5-2470, 2.30 GHz Cache L3: 20MB * https://github.com/mohanvive/siddhi-2.x
  • 30. Evaluation - Throughput Throughput improved by 800% in the proposed solution when Siddhi instance count is 20. 30 Throughput of the default Siddhi CEP engine Throughput in multi-core machines of the proposed solution
  • 31. Evaluation - Throughput 31 Throughput vs. within time Interval (32 Core Machine)
  • 32. Evaluation - Resource Utilization 32 CPU usage in default WSO2 Siddhi engine when processing CPU usage in the proposed solution * In 32 Core Machine, with 20 Siddhi Instances - 4 second partition time
  • 33. Evaluation - Resource Utilization 33 Thread count in default WSO2 Siddhi engine when processing Thread count in the proposed solution * In 32 Core Machine, with 20 Siddhi Instances - 4 second partition time
  • 34. Evaluation - Accuracy 34 Duplicated events (in %) vs. Siddhi instance count Disordered events (in %) vs. Siddhi instance count 13% - 20% of events got duplicated and 3% - 11% of events get disordered compared to patterns detected by the default Siddhi CEP engine.
  • 35. Evaluation - Latency 35 Latency in default WSO2 Siddhi CEP engine Latency in the proposed solution Per event latency increased from 2-3 milliseconds to 8-20 milliseconds (Siddhi instance count is 20)
  • 37. Summary ● Proposed time-based partition approach to scale pattern and sequence CEP queries. ● A scaling approach which is independent of internal implementation of a CEP engine. ● Proposed an approach to overcome event duplication and event reordering that arise due to the use of multiple CEP engines. ● Achieved 800% improvement in throughput. ● Provides 100% accuracy for the use cases which expecting ‘atleast-one’ QOS. ● Evaluated and verified the effectiveness of the solution by looking at various attributes (Within Time, No of Siddhi instances and etc..) ● Can be used to scale other CEP queries which can be partitioned by time. 37
  • 38. Limitations ● Our proposed solution would not be an ideal approach for Pattern and Sequence queries which has large ‘within’ time. ● Due to buffering and partition nature of the solution, pattern detection can be duplicated and output might contain duplicated events. Not suits well for cases which required ‘exactly one’ QOS scenarios ● No of Siddhi instance count is an user configuration value. ● Due to the parallelism while processing, pattern detected events can get reordered. 38
  • 39. Future Work ● Self tuning no of Siddhi instance count based on hardware resource consumption and other factors like throughput and latency. ● Exploring the possibility to scale pattern queries which has longer ‘within’ time. ● Implement proposed approach in a distributed environment and verify effectiveness. ● Explore other options to remove event duplication and reorder events. 39