Complex Event Processing (CEP) plays a major role in real-time analytics such as identifying possible frauds in credit card transactions and geospatial analysis. In CEP, events that are received from different data sources are stored in memory and processed on the fly. Scaling is one of the most important features of a CEP engine. Contemporary CEP engines provide several options to scale event processing vertically and horizontally. For example, these include scaling with Storm cluster, distributed object cache, and publisher-subscriber model, all of which come under random or attribute based partitioning. These approaches help to handle a large number of queries, queries that need a large memory, events which come in high rate, and complex queries that might not fit within a single machine. However, it is difficult to scale pattern and sequence detection in CEP for high event rate because the pattern and sequence detection depend on a set of events happened overtime. Existing scaling approaches based on random or attribute based partitioning affects the continuous event flow and event ordering, which are most important attributes for pattern and sequence matching.
We propose a novel approach to scale pattern and sequence detection queries for high incoming event rate. In the proposed approach incoming events are kept in a queue, grouped into partitions based on time interval defined in the query with some overlapping events, then the events are pushed to several CEP engines and processed simultaneously. Finally, the processed events are filtered and reordered before publishing out from the CEP. Performance analysis showed that the proposed technique increase the throughput by 800%, while increasing the per event latency from 2-3 milliseconds to 8-10 milliseconds due to queuing nature of the solution.
Real time intrusion detection in network traffic using adaptive and auto-scal...Gobinath Loganathan
Oral presentation of Real-time Intrusion Detection in Network Traffic Using Adaptive and Auto-scaling Stream Processor
at IEEE Global Communications Conference (Globecom 2018).
Abstract:
Advanced intrusion detection systems are beginning to utilize the power and flexibility offered by Complex Event Processing (CEP) engines. Adapting to new attacks and optimizing CEP rules are two challenges in this domain. Optimizing CEP rules requires a complete framework which can be ported to stream processors because a CEP rule cannot run without a stream processor. External dependencies of stream processors make CEP rule a black box which is hard to optimize. In this paper, we present a novel adaptive and functionally auto-scaling stream processor: "Wisdom" with a built-in hybrid optimizer developed using Particle Swarm Optimization, and Bisection algorithms to optimize CEP rule parameters. We show that an adaptive "Wisdom" rule tuned by the proposed optimization algorithm is able to detect selected attacks in CICIDS 2017 dataset with an average precision of 99.98% and an average recall of 93.42% while processing over 2.5 million events per second. The proposed distributed functionally auto-scaling deployment mode consumes significantly fewer system resources than the monolithic deployment of CEP rules.
This presentation describes a intelligent IT monitoring solution that uses Nagios as source of information, Esper as the CEP engine and a PCA algorithm.
[WSO2Con EU 2018] Patterns for Building Streaming AppsWSO2
This presentation explains how to enable digital transformation through streaming analytics and how easily streaming applications can be implemented. We look at the following:
- The Architecture of WSO2 Stream Processor
- Understanding streaming constructs
- Patterns of processing data in real-time, incrementally and with intelligence
- Applying patterns when building streaming apps
- Deployment patterns
Introduction of streaming data, difference between batch processing and stream processing, Research issues in streaming data processing, Performance evaluation metrics , tools for stream processing.
Lifecycle Inference on Unreliable Event DataDatabricks
A common motif in data science tasks is the inference of a latent, time-variant property from observations of singular timestamped events. This task is particularly prevalent in the cybsersecurity domain where a substantial portion of analysis efforts are dedicated toward system and network event logs. While such an inference can be straightforward in circumstances where data provenance is known and controlled, consider the situation where erroneous events exist or the events represent an unknown fraction of a total population. Such confounding factors complicate inference on third party datasets. Determining the lifetime that an organization operates a digital asset (e.g. an IP address) based on observations of the asset is one manifestation of this inference task that is critical to security ratings services which require an asset inventory of rated organizations. Leveraging third party data sources can improve the coverage of an asset inventory; however, false positives and unknown sampling rates of observation events in external, unmanaged data sources can degrade the veracity of inferred asset attributions. This talk will describe Spark DataFrames tradecraft for addressing these challenges by generating attribution lifetime windows from asset observations that are reinforced and extended by future observations. While the concept of event "refresh" is not traditionally considered within the MapReduce paradigm; it is trivial to implement within Spark through creative application of multiple Window functions. The described approach provides multiple desirable functionalities including a parameter-tunable observation reinforcement threshold to exclude false positives or outlying observations, event deduplication through time box partitioning, as well as a natural mechanism to age-off asset assignments that are no longer valid. This technique, entirely encapsulated in Spark, enables BitSight to run digital asset attribution simulations across billions of records and months of collection to evaluate, improve, and ultimately deploy novel digital asset discovery methodologies.
Author: Austin Allshouse
Real time intrusion detection in network traffic using adaptive and auto-scal...Gobinath Loganathan
Oral presentation of Real-time Intrusion Detection in Network Traffic Using Adaptive and Auto-scaling Stream Processor
at IEEE Global Communications Conference (Globecom 2018).
Abstract:
Advanced intrusion detection systems are beginning to utilize the power and flexibility offered by Complex Event Processing (CEP) engines. Adapting to new attacks and optimizing CEP rules are two challenges in this domain. Optimizing CEP rules requires a complete framework which can be ported to stream processors because a CEP rule cannot run without a stream processor. External dependencies of stream processors make CEP rule a black box which is hard to optimize. In this paper, we present a novel adaptive and functionally auto-scaling stream processor: "Wisdom" with a built-in hybrid optimizer developed using Particle Swarm Optimization, and Bisection algorithms to optimize CEP rule parameters. We show that an adaptive "Wisdom" rule tuned by the proposed optimization algorithm is able to detect selected attacks in CICIDS 2017 dataset with an average precision of 99.98% and an average recall of 93.42% while processing over 2.5 million events per second. The proposed distributed functionally auto-scaling deployment mode consumes significantly fewer system resources than the monolithic deployment of CEP rules.
This presentation describes a intelligent IT monitoring solution that uses Nagios as source of information, Esper as the CEP engine and a PCA algorithm.
[WSO2Con EU 2018] Patterns for Building Streaming AppsWSO2
This presentation explains how to enable digital transformation through streaming analytics and how easily streaming applications can be implemented. We look at the following:
- The Architecture of WSO2 Stream Processor
- Understanding streaming constructs
- Patterns of processing data in real-time, incrementally and with intelligence
- Applying patterns when building streaming apps
- Deployment patterns
Introduction of streaming data, difference between batch processing and stream processing, Research issues in streaming data processing, Performance evaluation metrics , tools for stream processing.
Lifecycle Inference on Unreliable Event DataDatabricks
A common motif in data science tasks is the inference of a latent, time-variant property from observations of singular timestamped events. This task is particularly prevalent in the cybsersecurity domain where a substantial portion of analysis efforts are dedicated toward system and network event logs. While such an inference can be straightforward in circumstances where data provenance is known and controlled, consider the situation where erroneous events exist or the events represent an unknown fraction of a total population. Such confounding factors complicate inference on third party datasets. Determining the lifetime that an organization operates a digital asset (e.g. an IP address) based on observations of the asset is one manifestation of this inference task that is critical to security ratings services which require an asset inventory of rated organizations. Leveraging third party data sources can improve the coverage of an asset inventory; however, false positives and unknown sampling rates of observation events in external, unmanaged data sources can degrade the veracity of inferred asset attributions. This talk will describe Spark DataFrames tradecraft for addressing these challenges by generating attribution lifetime windows from asset observations that are reinforced and extended by future observations. While the concept of event "refresh" is not traditionally considered within the MapReduce paradigm; it is trivial to implement within Spark through creative application of multiple Window functions. The described approach provides multiple desirable functionalities including a parameter-tunable observation reinforcement threshold to exclude false positives or outlying observations, event deduplication through time box partitioning, as well as a natural mechanism to age-off asset assignments that are no longer valid. This technique, entirely encapsulated in Spark, enables BitSight to run digital asset attribution simulations across billions of records and months of collection to evaluate, improve, and ultimately deploy novel digital asset discovery methodologies.
Author: Austin Allshouse
Imagine that self-driving cars now exist and are becoming widespread around the world. To facilitate the transition, it's necessary to set up central service to monitor traffic conditions nationwide, deploy sensors throughout the interstate system that monitor traffic conditions including car speeds, pavement and weather conditions, as well as accidents, construction, and other sources of traffic tie ups.
MongoDB has been selected as the database for this application. In this webinar, we will walk through designing the application’s schema that will both support the high update and read volumes as well as the data aggregation and analytics queries.
Implementation of Banker’s Algorithm Using Dynamic Modified Approachrahulmonikasharma
Banker’s algorithm referred to as resource allocation and deadlock avoidance algorithm that checks for the safety by simulating the allocation of predetermined maximum possible of resources and makes the system into s-state by checking the possible deadlock conditions for all other pending processes. It needs to know how much of each resource a process could possibly request. Number of processes are static in algorithm, but in most of system processes varies dynamically and no additional process will be started while it is in execution. The number of resources are not allow to go down while it is in execution. In this research an approach for Dynamic Banker's algorithm is proposed which allows the number of resources to be changed at runtime that prevents the system to fall in unsafe state. It also give details about all the resources and processes that which one require resources and in what quantity. This also allocates the resource automatically to the stopped process for the execution and will always give the appropriate safe sequence for the given processes.
Implementation of Banker’s Algorithm Using Dynamic Modified Approachrahulmonikasharma
Banker’s algorithm referred to as resource allocation and deadlock avoidance algorithm that checks for the safety by simulating the allocation of predetermined maximum possible of resources and makes the system into s-state by checking the possible deadlock conditions for all other pending processes. It needs to know how much of each resource a process could possibly request. Number of processes are static in algorithm, but in most of system processes varies dynamically and no additional process will be started while it is in execution. The number of resources are not allow to go down while it is in execution. In this research an approach for Dynamic Banker's algorithm is proposed which allows the number of resources to be changed at runtime that prevents the system to fall in unsafe state. It also give details about all the resources and processes that which one require resources and in what quantity. This also allocates the resource automatically to the stopped process for the execution and will always give the appropriate safe sequence for the given processes.
To view recording of this webinar please use below URL:
http://wso2.com/library/webinars/2016/06/analytics-in-your-enterprise/
Big data spans many fields and brings together technologies like distributed systems, machine learning, statistics and Internet of Things (IoT). It has now become a multi-billion dollar industry with use cases ranging from targeted advertising and fraud detection to product recommendations and market surveys.
Some use cases such as urban planning can be slower (done in batch mode), while others such as the stock market needs results in milliseconds (done is a streaming fashion). Different technologies are used for each case; MapReduce for batch analytics, complex event processing for real-time analytics and machine learning for predictive analytics. Furthermore, the type of analysis ranges from basic statistics to complicated prediction models.
This webinar will discuss the big data landscape including
Concepts, use cases and technologies
Capabilities and applications of the WSO2 analytics platform
WSO2 Data Analytics Server
WSO2 Complex Event Processor
WSO2 Machine Learner
Event Processing Using Semantic Web TechnologiesMikko Rinne
The presentation held at the public defence of my doctoral thesis at the department of computer science of Aalto University, Espoo, Finland on 1st of September 2017.
The widespread adoption of Information Technology systems and their
capability to trace data about process executions has made available Information
Technology data for the analysis of process executions. Meanwhile, at business
level, static and procedural knowledge, which can be exploited to analyze and rea-
son on data, is often available. In this paper we aim at providing an approach that,
combining static and procedural aspects, business and data levels and exploiting
semantic-based techniques allows business analysts to infer knowledge and use it
to analyze system executions. The proposed solution has been implemented using
current scalable Semantic Web technologies, that offer the possibility to keep the
advantages of semantic-based reasoning with non-trivial quantities of data.
The influence of data size on a high-performance computing memetic algorithm ...journalBEEI
The fingerprint is one kind of biometric. This biometric unique data have to be processed well and secure. The problem gets more complicated as data grows. This work is conducted to process image fingerprint data with a memetic algorithm, a simple and reliable algorithm. In order to achieve the best result, we run this algorithm in a parallel environment by utilizing a multi-thread feature of the processor. We propose a high-performance computing memetic algorithm (HPCMA) to process a 7200 image fingerprint dataset which is divided into fifteen specimens based on its characteristics based on the image specification to get the detail of each image. A combination of each specimen generates a new data variation. This algorithm runs in two different operating systems, Windows 7 and Windows 10 then we measure the influence of data size on processing time, speed up, and efficiency of HPCMA with simple linear regression. The result shows data size is very influencing to processing time more than 90%, to speed up more than 30%, and to efficiency more than 19%.
PIDS research slides from MALCON 2018 conference - Asaf HechtAsaf Hecht
Research presentation of: Analysis and Detection of Network Printer Attacks.
Presented by Asaf Hecht at "the 13th International Conference on Malicious and Unwanted Software" (MALCON 2018) in Nantucket, USA.
Data-Driven Analysis of Batch Processing Inefficiencies in Business ProcessesMarlon Dumas
Slides of a research paper presentation at the 16th International Conference on Research Challenges in Information Science (RCIS).
The research paper presents an approach to analyze event logs of business processes in order to identify batched activities and to analyze the waiting times caused by these activities.
Paper available at: https://link.springer.com/chapter/10.1007/978-3-031-05760-1_14
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Andrii Gakhov
We interact with an increasing amount of data but classical data structures and algorithms can't fit our requirements anymore. This talk is to present the probabilistic algorithms and data structures and describe the main areas of their applications.
Digital Document Preservation Simulation - Boston Python User's GroupMicah Altman
Mr. Rick Landau, Research Affiliate, will present a briefing at the Boston Python Users Group meeting on the topic of simulating document loss .
Durable access to information requires insuring against multiple risks such as media failures, format obsolescence, fires, floods, earthquakes,
institutional failures, mergers, funding cuts, and malicious insiders. Real libraries need empirical guidance on storage strategies and costs -- and vendor claims of reliability are often uninformative, or suspect. The talk describes how event-based simulations, coded in python, are used to simulate document loss under a variety of risks profiles, and to provide practical guidance.
This meeting will be held at the Microsoft NERD building, 1 Memorial Drive, and is open to the public. More information is available through the meeting web page:
Charith Perera, Arkady Zaslavsky, Michael Compton, Peter Christen, and Dimitrios Georgakopoulos, Semantic-driven Configuration of Internet of Things Middleware, Proceedings of the 9th International Conference on Semantics, Knowledge & Grids (SKG), Beijing, China, October, 2013
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
Finding the number of unique users out of 10 billion events per day is challenging. At this session, we're going to describe how re-architecting our data infrastructure, relying on Druid and ThetaSketch, enables our customers to obtain these insights in real-time.
To put things into context, at NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. Specifically, we provide them with the ability to see the number of unique users who meet a given criterion.
Historically, we have used Elasticsearch to answer these types of questions, however, we have encountered major scaling and stability issues.
In this presentation we will detail the journey of rebuilding our data infrastructure, including researching, benchmarking and productionizing a new technology, Druid, with ThetaSketch, to overcome the limitations we were facing.
We will also provide guidelines and best practices with regards to Druid.
Topics include :
* The need and possible solutions
* Intro to Druid and ThetaSketch
* How we use Druid
* Guidelines and pitfalls
Imagine that self-driving cars now exist and are becoming widespread around the world. To facilitate the transition, it's necessary to set up central service to monitor traffic conditions nationwide, deploy sensors throughout the interstate system that monitor traffic conditions including car speeds, pavement and weather conditions, as well as accidents, construction, and other sources of traffic tie ups.
MongoDB has been selected as the database for this application. In this webinar, we will walk through designing the application’s schema that will both support the high update and read volumes as well as the data aggregation and analytics queries.
Implementation of Banker’s Algorithm Using Dynamic Modified Approachrahulmonikasharma
Banker’s algorithm referred to as resource allocation and deadlock avoidance algorithm that checks for the safety by simulating the allocation of predetermined maximum possible of resources and makes the system into s-state by checking the possible deadlock conditions for all other pending processes. It needs to know how much of each resource a process could possibly request. Number of processes are static in algorithm, but in most of system processes varies dynamically and no additional process will be started while it is in execution. The number of resources are not allow to go down while it is in execution. In this research an approach for Dynamic Banker's algorithm is proposed which allows the number of resources to be changed at runtime that prevents the system to fall in unsafe state. It also give details about all the resources and processes that which one require resources and in what quantity. This also allocates the resource automatically to the stopped process for the execution and will always give the appropriate safe sequence for the given processes.
Implementation of Banker’s Algorithm Using Dynamic Modified Approachrahulmonikasharma
Banker’s algorithm referred to as resource allocation and deadlock avoidance algorithm that checks for the safety by simulating the allocation of predetermined maximum possible of resources and makes the system into s-state by checking the possible deadlock conditions for all other pending processes. It needs to know how much of each resource a process could possibly request. Number of processes are static in algorithm, but in most of system processes varies dynamically and no additional process will be started while it is in execution. The number of resources are not allow to go down while it is in execution. In this research an approach for Dynamic Banker's algorithm is proposed which allows the number of resources to be changed at runtime that prevents the system to fall in unsafe state. It also give details about all the resources and processes that which one require resources and in what quantity. This also allocates the resource automatically to the stopped process for the execution and will always give the appropriate safe sequence for the given processes.
To view recording of this webinar please use below URL:
http://wso2.com/library/webinars/2016/06/analytics-in-your-enterprise/
Big data spans many fields and brings together technologies like distributed systems, machine learning, statistics and Internet of Things (IoT). It has now become a multi-billion dollar industry with use cases ranging from targeted advertising and fraud detection to product recommendations and market surveys.
Some use cases such as urban planning can be slower (done in batch mode), while others such as the stock market needs results in milliseconds (done is a streaming fashion). Different technologies are used for each case; MapReduce for batch analytics, complex event processing for real-time analytics and machine learning for predictive analytics. Furthermore, the type of analysis ranges from basic statistics to complicated prediction models.
This webinar will discuss the big data landscape including
Concepts, use cases and technologies
Capabilities and applications of the WSO2 analytics platform
WSO2 Data Analytics Server
WSO2 Complex Event Processor
WSO2 Machine Learner
Event Processing Using Semantic Web TechnologiesMikko Rinne
The presentation held at the public defence of my doctoral thesis at the department of computer science of Aalto University, Espoo, Finland on 1st of September 2017.
The widespread adoption of Information Technology systems and their
capability to trace data about process executions has made available Information
Technology data for the analysis of process executions. Meanwhile, at business
level, static and procedural knowledge, which can be exploited to analyze and rea-
son on data, is often available. In this paper we aim at providing an approach that,
combining static and procedural aspects, business and data levels and exploiting
semantic-based techniques allows business analysts to infer knowledge and use it
to analyze system executions. The proposed solution has been implemented using
current scalable Semantic Web technologies, that offer the possibility to keep the
advantages of semantic-based reasoning with non-trivial quantities of data.
The influence of data size on a high-performance computing memetic algorithm ...journalBEEI
The fingerprint is one kind of biometric. This biometric unique data have to be processed well and secure. The problem gets more complicated as data grows. This work is conducted to process image fingerprint data with a memetic algorithm, a simple and reliable algorithm. In order to achieve the best result, we run this algorithm in a parallel environment by utilizing a multi-thread feature of the processor. We propose a high-performance computing memetic algorithm (HPCMA) to process a 7200 image fingerprint dataset which is divided into fifteen specimens based on its characteristics based on the image specification to get the detail of each image. A combination of each specimen generates a new data variation. This algorithm runs in two different operating systems, Windows 7 and Windows 10 then we measure the influence of data size on processing time, speed up, and efficiency of HPCMA with simple linear regression. The result shows data size is very influencing to processing time more than 90%, to speed up more than 30%, and to efficiency more than 19%.
PIDS research slides from MALCON 2018 conference - Asaf HechtAsaf Hecht
Research presentation of: Analysis and Detection of Network Printer Attacks.
Presented by Asaf Hecht at "the 13th International Conference on Malicious and Unwanted Software" (MALCON 2018) in Nantucket, USA.
Data-Driven Analysis of Batch Processing Inefficiencies in Business ProcessesMarlon Dumas
Slides of a research paper presentation at the 16th International Conference on Research Challenges in Information Science (RCIS).
The research paper presents an approach to analyze event logs of business processes in order to identify batched activities and to analyze the waiting times caused by these activities.
Paper available at: https://link.springer.com/chapter/10.1007/978-3-031-05760-1_14
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Andrii Gakhov
We interact with an increasing amount of data but classical data structures and algorithms can't fit our requirements anymore. This talk is to present the probabilistic algorithms and data structures and describe the main areas of their applications.
Digital Document Preservation Simulation - Boston Python User's GroupMicah Altman
Mr. Rick Landau, Research Affiliate, will present a briefing at the Boston Python Users Group meeting on the topic of simulating document loss .
Durable access to information requires insuring against multiple risks such as media failures, format obsolescence, fires, floods, earthquakes,
institutional failures, mergers, funding cuts, and malicious insiders. Real libraries need empirical guidance on storage strategies and costs -- and vendor claims of reliability are often uninformative, or suspect. The talk describes how event-based simulations, coded in python, are used to simulate document loss under a variety of risks profiles, and to provide practical guidance.
This meeting will be held at the Microsoft NERD building, 1 Memorial Drive, and is open to the public. More information is available through the meeting web page:
Charith Perera, Arkady Zaslavsky, Michael Compton, Peter Christen, and Dimitrios Georgakopoulos, Semantic-driven Configuration of Internet of Things Middleware, Proceedings of the 9th International Conference on Semantics, Knowledge & Grids (SKG), Beijing, China, October, 2013
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
Finding the number of unique users out of 10 billion events per day is challenging. At this session, we're going to describe how re-architecting our data infrastructure, relying on Druid and ThetaSketch, enables our customers to obtain these insights in real-time.
To put things into context, at NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. Specifically, we provide them with the ability to see the number of unique users who meet a given criterion.
Historically, we have used Elasticsearch to answer these types of questions, however, we have encountered major scaling and stability issues.
In this presentation we will detail the journey of rebuilding our data infrastructure, including researching, benchmarking and productionizing a new technology, Druid, with ThetaSketch, to overcome the limitations we were facing.
We will also provide guidelines and best practices with regards to Druid.
Topics include :
* The need and possible solutions
* Intro to Druid and ThetaSketch
* How we use Druid
* Guidelines and pitfalls
Similar to Scaling Pattern and Sequence Queries in Complex Event Processing (20)
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Scaling Pattern and Sequence Queries in Complex Event Processing
1. SCALING PATTERN AND SEQUENCE
QUERIES IN COMPLEX EVENT PROCESSING
V. Mohanadarshan
148241N
Supervisors : Dr. Srinath Perera
Dr. Dilum Bandara
June 2nd, 2017
2. Research Contribution
● Goal
Propose an approach to scale pattern and sequence detection in Complex Event
Processing (CEP) to enable high event rate.
● Importance
Existing approaches only solve specific subset of pattern and sequence detection
related scalability problems.
● Approach
Time-based event partitioning to scale pattern and sequence detection.
● Results
800% improvement in throughput and reduced re-ordering, slight increase in latency
2
3. Outline
● Real-time Analytics
● Need for Scaling
● Literature Review
● Methodology
○ Partition Events by Time
○ Handling Event Duplication
○ Event Reordering
● Performance Analysis
● Conclusions
● Future Work
3
4. Real-time Analytics
● Processing (listening to events and detecting
patterns) Data on the fly, while storing
minimal amount of information and
responding fast (from <1 ms to few seconds).
● Idea of Event streams, a series of events in
time.
● Enabling technologies
○ Stream Processing (Storm)
○ Complex Event Processing
4
5. Complex Event Processing
5Source : Mark Simms, Microsoft Streaminsight (http://www.slideshare.net/markginnebaugh/microsoft-streaminsight)
7. Pattern and Sequence Detection
● Pattern and sequence detection is the crown-jewel of CEP.
● Addresses a sequence of events that occur in order and are
correlated based on values of their attributes.
● Event patterns are implemented using a specialized state machine
approach.
7
from every (a1 = transactionStream [a1.amountWithdrawed < 100]
→ a2 = transactionStream [(a1.toAccountNo == a2.fromAccountNo) and (amountWithdrawed > 10000)]
within 5 min
select a1.fromAccountNo as suspectAccountNo
insert into possibleMoneyLaunderingActivityStream;
8. Important Features in CEP
● High Availability
● Scalability
● Distributed Processing
● Visual Composition
● Performance
● Debugger
8
9. Need for Scaling
● Scaling - Ability for a CEP system to handle larger or complex queries by adding
more resources
● Mostly CEP engines run in a large box, scaling up horizontally.
Scaling CEP has several dimensions:
1. Handling Large no of queries
2. Queries that needs large working memory
3. Handling a complex query that might not fit within a single machine
4. Handling large number of events
9
● S. Perera, How to scale Complex Event Processing (CEP) Systems? [online]. Available:
http://srinathsview.blogspot.com/2012/05/how-to-scale-complex-event-processing.html. [Dec. 23, 2014].
10. How to provide large-scale pattern and
sequence detection in CEP while supporting
high event rates?
12. Common Types of Scaling
12
Scaling
Vertical Scaling Horizontal Scaling
13. Partition Based Scaling
13
● R. Mayer, B. Koldehofe, and K. Rothermel, “Meeting Predictable Buffer Limits in the Parallel Execution of Event Processing Operators,” In Proc. IEEE BigData ‟04,
Washington, USA, Oct 2014, pp. 402–411.
● S. Perera, How to scale Complex Event Processing (CEP) Systems? [online]. Available: http://srinathsview.blogspot.com/2012/05/how-to-scale-complex-
event-processing.html.
14. Publisher-Subscriber Based Scaling
14
● V. Govindasamy and Prof. Dr. P. Thambidura, “An Efficient and Generic Filtering Approach for Uncertain Complex Event Processing,” In Proc International
Conference on Data Mining and Computer Engineering, Thailand, Bangkok, Dec 2012, pp. 211-216.
17. Scaling by Integrating with ESB
● The key architectural insight in
the system is to separate the
integration functionalities of
the ESB and the complex event
facilities.
● Stateless ESB, which can be
scaled out by adding more
processing nodes.
● CEP cluster can then be tuned
to handle high throughput and
scaled out separately.
● A. Aalto, “Scalability of Complex Event Processing as a part of a distributed Enterprise Service Bus,” Ph.D. dissertation, Dept. Science., Aalto University, Espoo, 2012.17
19. Key Stages of the Solution
● Incoming events are partitioned based on 'within' value defined in the query.
● The pattern is detected within a partition
● Remove duplicated events
● Reorder events based on timestamp.
19
20. Partition Events by Time
20
from every h1 = hitStream -> h2 = hitStream[h1.pid != pid and h1.tid == tid] -> h3 = hitStream[h1.pid == pid]
within 5 seconds
select h1.pid as player1, h2.pid as player2, h3.pid as player3, h1.tsr as tStamp1 , h2.tsr as tStamp2 , h3.tsr as
tStamp3
insert into patternMatchedStream;
Here we are looking for following 3 states,
1. Ball hit from a player x of team 1
2. Then, a ball hit from another player y of opponent team 2
3. Finally, a ball hit from the same player x who hit first.
Moreover, these 3 states needs to happen within 5 seconds.
21. Partition Events by Time - Overview
● Incoming events are get queued at the entry to the CEP engine.
● Then events in the queue are partitioned based on time values.
● Then each partitioned event group is pushed to one of the parallelly running CEP instances.
21
23. Event Reordering and Duplication Handling
23
define stream patternMatchedStream
(player1 string, player2 string, player3 string,
tStamp long, tStamp1 long, tStamp2 long); ");
From patternMatchedStream#window.kslack(10000)
select *
insert into filteredOutputStream;
24. Event Reordering
K-slack based Event Reordering
● K-slack transparently buffers and reorders events before they are processed by event
detectors.
● Buffering and sorting delays the processing of the input events by the query operator, thus
increases the latency of the query results.
● It dynamically adjusts the buffer size to a big-enough value to accommodate all late arrivals,
aiming to provide near exact query results
24
● M. Li, M. Liu, L. Ding, E. A. Rundensteiner and M. Mani, “Event Stream Processing with Out-of-Order Data Arrival,” In Proc. 27th International Conference on
Distributed Computing Systems Workshops, Toronto, Canada, Jun 2007, pp. 67.
25. Event Duplication Handling
● Event duplication can be handled using a HashSet-based data structure.
● HashSet creates a collection that uses a hash table for storage. A hash
table stores information by using a mechanism called hashing.
● We wrote hash function of the event which returns the hash code by
considering the attributes of event.
● Hash code is then used as the index at which the data associated with
the key is stored
25Figure Source : http://computersecuritypsh.wikia.com/wiki/Hash_Function
28. Benchmark
Soccer monitoring benchmark is based on the DEBS (Distributed Event Based Systems) 2013
Grand Challenge
28
● Data used for this benchmark is collected
by the real-time locating system deployed
on a football field in Germany.
● Totally 47 Millions of events.
● Average event size is 365 bytes.
● Every event describes a position of a
given sensor in a 3D coordinate system.
● DEBS Org, DEBS 2013 Grand Challenge: Soccer monitoring [online]. Available: http://debs.org/?p=41. [Jan. 05th, 2017]
29. Evaluation Setup
● Implemented a POC setup to evaluate Siddhi CEP engine and our implementation*
● Tests were conducted with Oracle JDK 1.7.0_79-b15
● Hardware Configuration,
29
Property Value
Cores 32 and 16
Memory Min- 16GB and Max- 18GB
CPU IntelR
XeonR
core E5-2470, 2.30 GHz
Cache L3: 20MB
* https://github.com/mohanvive/siddhi-2.x
30. Evaluation - Throughput
Throughput improved by 800% in the proposed solution when Siddhi instance count is 20.
30
Throughput of the default Siddhi CEP engine Throughput in multi-core machines of the proposed solution
32. Evaluation - Resource Utilization
32
CPU usage in default WSO2 Siddhi engine when processing CPU usage in the proposed solution
* In 32 Core Machine, with 20 Siddhi Instances - 4 second partition time
33. Evaluation - Resource Utilization
33
Thread count in default WSO2 Siddhi engine when processing Thread count in the proposed solution
* In 32 Core Machine, with 20 Siddhi Instances - 4 second partition time
34. Evaluation - Accuracy
34
Duplicated events (in %) vs. Siddhi instance count Disordered events (in %) vs. Siddhi instance count
13% - 20% of events got duplicated and 3% - 11% of events get disordered compared to
patterns detected by the default Siddhi CEP engine.
35. Evaluation - Latency
35
Latency in default WSO2 Siddhi CEP engine Latency in the proposed solution
Per event latency increased from 2-3 milliseconds to 8-20 milliseconds (Siddhi instance count is 20)
37. Summary
● Proposed time-based partition approach to scale pattern and sequence CEP queries.
● A scaling approach which is independent of internal implementation of a CEP engine.
● Proposed an approach to overcome event duplication and event reordering that arise
due to the use of multiple CEP engines.
● Achieved 800% improvement in throughput.
● Provides 100% accuracy for the use cases which expecting ‘atleast-one’ QOS.
● Evaluated and verified the effectiveness of the solution by looking at various attributes
(Within Time, No of Siddhi instances and etc..)
● Can be used to scale other CEP queries which can be partitioned by time.
37
38. Limitations
● Our proposed solution would not be an ideal approach for Pattern and Sequence
queries which has large ‘within’ time.
● Due to buffering and partition nature of the solution, pattern detection can be
duplicated and output might contain duplicated events. Not suits well for cases
which required ‘exactly one’ QOS scenarios
● No of Siddhi instance count is an user configuration value.
● Due to the parallelism while processing, pattern detected events can get
reordered.
38
39. Future Work
● Self tuning no of Siddhi instance count based on hardware resource consumption and other
factors like throughput and latency.
● Exploring the possibility to scale pattern queries which has longer ‘within’ time.
● Implement proposed approach in a distributed environment and verify effectiveness.
● Explore other options to remove event duplication and reorder events.
39