Today there are so much data being available from sources like sensors (RFIDs, Near Field Communication), web activities, transactions, social networks, etc. Making sense of this avalanche of data requires efficient and fast processing.
Processing of high volume of events to derive higher-level information is a vital part of taking critical decisions, and
Complex Event Processing (CEP) has become one of the most rapidly emerging fields in data processing. e-Science
use-cases, business applications, financial trading applications, operational analytics applications and business activity monitoring applications are some use-cases that directly use CEP. This paper discusses different design decisions associated
with CEP Engines, and proposes some approaches to improve CEP performance by using more stream processing
style pipelines. Furthermore, the paper will discuss Siddhi, a CEP Engine that implements those suggestions. We
present a performance study that exhibits that the resulting CEP Engine—Siddhi—has significantly improved performance.
Primary contributions of this paper are performing a critical analysis of the CEP Engine design and identifying
suggestions for improvements, implementing those improvements
through Siddhi, and demonstrating the soundness of those suggestions through empirical evidence.
Stratio Streaming is the result of combining the power of Spark Streaming as a continuous computing framework and Siddhi CEP engine as complex event processing engine.
a simple presentation about different big data stream processing systems such as SPARK, SAMZA and STORM and the difference between their architectures and purpose, in addition we talk about streaming layers tools such as Kafka and rabbitMQ, this presentation refer to this paper
https://vsis-www.informatik.uni-hamburg.de/getDoc.php/publications/561/Real-time%20stream%20processing%20for%20Big%20Data.pdf and other useful links.
Building Reliable Data Lakes at Scale with Delta LakeDatabricks
Most data practitioners grapple with data reliability issues—it’s the bane of their existence. Data engineers, in particular, strive to design, deploy, and serve reliable data in a performant manner so that their organizations can make the most of their valuable corporate data assets.
Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. Built on open standards, Delta Lake employs co-designed compute and storage and is compatible with Spark API’s. It powers high data reliability and query performance to support big data use cases, from batch and streaming ingests, fast interactive queries to machine learning. In this tutorial we will discuss the requirements of modern data engineering, the challenges data engineers face when it comes to data reliability and performance and how Delta Lake can help. Through presentation, code examples and notebooks, we will explain these challenges and the use of Delta Lake to address them. You will walk away with an understanding of how you can apply this innovation to your data architecture and the benefits you can gain.
This tutorial will be both instructor-led and hands-on interactive session. Instructions on how to get tutorial materials will be covered in class.
What you’ll learn:
Understand the key data reliability challenges
How Delta Lake brings reliability to data lakes at scale
Understand how Delta Lake fits within an Apache Spark™ environment
How to use Delta Lake to realize data reliability improvements
Prerequisites
A fully-charged laptop (8-16GB memory) with Chrome or Firefox
Pre-register for Databricks Community Edition
Stratio Streaming is the result of combining the power of Spark Streaming as a continuous computing framework and Siddhi CEP engine as complex event processing engine.
a simple presentation about different big data stream processing systems such as SPARK, SAMZA and STORM and the difference between their architectures and purpose, in addition we talk about streaming layers tools such as Kafka and rabbitMQ, this presentation refer to this paper
https://vsis-www.informatik.uni-hamburg.de/getDoc.php/publications/561/Real-time%20stream%20processing%20for%20Big%20Data.pdf and other useful links.
Building Reliable Data Lakes at Scale with Delta LakeDatabricks
Most data practitioners grapple with data reliability issues—it’s the bane of their existence. Data engineers, in particular, strive to design, deploy, and serve reliable data in a performant manner so that their organizations can make the most of their valuable corporate data assets.
Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. Built on open standards, Delta Lake employs co-designed compute and storage and is compatible with Spark API’s. It powers high data reliability and query performance to support big data use cases, from batch and streaming ingests, fast interactive queries to machine learning. In this tutorial we will discuss the requirements of modern data engineering, the challenges data engineers face when it comes to data reliability and performance and how Delta Lake can help. Through presentation, code examples and notebooks, we will explain these challenges and the use of Delta Lake to address them. You will walk away with an understanding of how you can apply this innovation to your data architecture and the benefits you can gain.
This tutorial will be both instructor-led and hands-on interactive session. Instructions on how to get tutorial materials will be covered in class.
What you’ll learn:
Understand the key data reliability challenges
How Delta Lake brings reliability to data lakes at scale
Understand how Delta Lake fits within an Apache Spark™ environment
How to use Delta Lake to realize data reliability improvements
Prerequisites
A fully-charged laptop (8-16GB memory) with Chrome or Firefox
Pre-register for Databricks Community Edition
Constrained Optimization with Genetic Algorithms and Project BonsaiIvo Andreev
Traditional machine learning requires volumes of labelled data that can be time consuming and expensive to produce,”
“Machine teaching leverages the human capability to decompose and explain concepts to train machine learning models
direction (teaching the correct answer is not by showing the data for it, but by using a person to show the answer).
Project Bonsai is a low code platform for intelligent solutions but with a different perspective on data it allows a completely new approach to tasks, especially when the physical world is involved. Under the hood it combines machine teaching, calibration and optimization to create intelligent control systems using simulations. The teaching curriculum is performed using a new language concept - “Inkling” and training a model is easy and interactive.
Streaming Analytics for Financial EnterprisesDatabricks
Streaming Analytics (or Fast Data processing) is becoming an increasingly popular subject in the financial sector. There are two main reasons for this development. First, more and more data has to be analyze in real-time to prevent fraud; all transactions that are being processed by banks have to pass and ever-growing number of tests to make sure that the money is coming from and going to legitimate sources. Second, customers want to have friction-less mobile experiences while managing their money, such as immediate notifications and personal advise based on their online behavior and other users’ actions.
A typical streaming analytics solution follows a ‘pipes and filters’ pattern that consists of three main steps: detecting patterns on raw event data (Complex Event Processing), evaluating the outcomes with the aid of business rules and machine learning algorithms, and deciding on the next action. At the core of this architecture is the execution of predictive models that operate on enormous amounts of never-ending data streams.
In this talk, I’ll present an architecture for streaming analytics solutions that covers many use cases that follow this pattern: actionable insights, fraud detection, log parsing, traffic analysis, factory data, the IoT, and others. I’ll go through a few architecture challenges that will arise when dealing with streaming data, such as latency issues, event time vs server time, and exactly-once processing. The solution is build on the KISSS stack: Kafka, Ignite, and Spark Structured Streaming. The solution is open source and available on GitHub.
1) NVIDIA-Iguazio Accelerated Solutions for Deep Learning and Machine Learning (30 mins):
About the speaker:
Dr. Gabriel Noaje, Senior Solutions Architect, NVIDIA
http://bit.ly/GabrielNoaje
2) GPUs in Data Science Pipelines ( 30 mins)
- GPU as a Service for enterprise AI
- A short demo on the usage of GPUs for model training and model inferencing within a data science workflow
About the speaker:
Anant Gandhi, Solutions Engineer, Iguazio Singapore. https://www.linkedin.com/in/anant-gandhi-b5447614/
Reference architecture for Internet of ThingsSujee Maniyam
What kind of a data infrastructure is needed, to support Internet of Things?
This talk presents a reference architecture.
We are actually building this architecture as open source project. See here : bit.ly / iotxyz
This session is recommended for anyone interested in understanding how to use AWS big data services to develop real-time analytics applications. In this session, you will get an overview of a number of Amazon's big data and analytics services that enable you to build highly scaleable cloud applications that immediately and continuously analyze large sets of distributed data. We'll explain how services like Amazon Kinesis, EMR and Redshift can be used for data ingestion, processing and storage to enable real-time insights and analysis into customer, operational and machine generated data and log files. We'll explore system requirements, design considerations, and walk through a specific customer use case to illustrate the power of real-time insights on their business.
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...Henry Saputra
The Krylov Project is the key component in eBay's AI Platform initiative that provides an easy to use, open, and fast AI orchestration engine that is deployed as managed services in eBay cloud.
Using Krylov, AI scientists can access eBay's massive datasets; build and train AI models; spin up powerful compute (high-memory or GPU instances) on the Krylov compute cluster; and set up machine learning pipelines, such as using declarative constructs that stitch together pipeline lifecycle.
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Shirshanka Das
We discuss LinkedIn's big data ecosystem and its evolution through the years. We introduce three open source projects, Gobblin for ingestion, Cubert for computation and Pinot for fast OLAP serving. We also showcase our in-house data discovery and lineage portal WhereHows.
Streamlio and IoT analytics with Apache PulsarStreamlio
To keep up with fast-moving IoT data, you need technology that can collect, process and store data with performance and scalability. This presentation from Data Day Texas looks at the technology requirements and how Apache Pulsar can help to meet them.
Build Large-Scale Data Analytics and AI Pipeline Using RayDPDatabricks
A large-scale end-to-end data analytics and AI pipeline usually involves data processing frameworks such as Apache Spark for massive data preprocessing, and ML/DL frameworks for distributed training on the preprocessed data. A conventional approach is to use two separate clusters and glue multiple jobs. Other solutions include running deep learning frameworks in an Apache Spark cluster, or use workflow orchestrators like Kubeflow to stitch distributed programs. All these options have their own limitations. We introduce Ray as a single substrate for distributed data processing and machine learning. We also introduce RayDP which allows you to start an Apache Spark job on Ray in your python program and utilize Ray’s in-memory object store to efficiently exchange data between Apache Spark and other libraries. We will demonstrate how this makes building an end-to-end data analytics and AI pipeline simpler and more efficient.
Real Time Machine Learning Visualization With SparkChester Chen
Training machine learning model involves a lot of experimentation, we need a way to visualize the training process.
We presented a system to enable real time machine learning visualization with Spark:
-- Gives visibility into the training of a model
-- Allows us monitor the convergence of the algorithms during training
-- Can stop the iterations when convergence is good enough.
Apache Kafka Streams + Machine Learning / Deep LearningKai Wähner
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams...
Big Data and Machine Learning are key for innovation in many industries today. Large amounts of historical data are stored and analyzed in Hadoop, Spark or other clusters to find patterns and insights, e.g. for predictive maintenance, fraud detection or cross-selling.
This first part of the session explains how to build analytic models with R, Python and Scala leveraging open source machine learning / deep learning frameworks like Apache Spark, TensorFlow or H2O.ai. The second part discusses how to leverage these built analytic models in your own streaming applications or microservices; leveraging the Apache Kafka cluster and Kafka Streams instead of building an own stream processing cluster. The session focuses on live demos and teaches lessons learned for executing analytic models in a highly scalable and performant way.
The last part explains how Apache Kafka can help to move from a manual build and deployment of analytic models to continuous online model improvement in real time.
Complex Event Processing (CEP) for Next-Generation Security Event Management,...Tim Bass
Complex Event Processing (CEP) for Next-Generation Security Event Management, Fraud and Intrusion Detection , April 17, 2007 (First Draft), London, Tim Bass, CISSP, Director, Principal Global Architect
Emerging Technologies Group
Constrained Optimization with Genetic Algorithms and Project BonsaiIvo Andreev
Traditional machine learning requires volumes of labelled data that can be time consuming and expensive to produce,”
“Machine teaching leverages the human capability to decompose and explain concepts to train machine learning models
direction (teaching the correct answer is not by showing the data for it, but by using a person to show the answer).
Project Bonsai is a low code platform for intelligent solutions but with a different perspective on data it allows a completely new approach to tasks, especially when the physical world is involved. Under the hood it combines machine teaching, calibration and optimization to create intelligent control systems using simulations. The teaching curriculum is performed using a new language concept - “Inkling” and training a model is easy and interactive.
Streaming Analytics for Financial EnterprisesDatabricks
Streaming Analytics (or Fast Data processing) is becoming an increasingly popular subject in the financial sector. There are two main reasons for this development. First, more and more data has to be analyze in real-time to prevent fraud; all transactions that are being processed by banks have to pass and ever-growing number of tests to make sure that the money is coming from and going to legitimate sources. Second, customers want to have friction-less mobile experiences while managing their money, such as immediate notifications and personal advise based on their online behavior and other users’ actions.
A typical streaming analytics solution follows a ‘pipes and filters’ pattern that consists of three main steps: detecting patterns on raw event data (Complex Event Processing), evaluating the outcomes with the aid of business rules and machine learning algorithms, and deciding on the next action. At the core of this architecture is the execution of predictive models that operate on enormous amounts of never-ending data streams.
In this talk, I’ll present an architecture for streaming analytics solutions that covers many use cases that follow this pattern: actionable insights, fraud detection, log parsing, traffic analysis, factory data, the IoT, and others. I’ll go through a few architecture challenges that will arise when dealing with streaming data, such as latency issues, event time vs server time, and exactly-once processing. The solution is build on the KISSS stack: Kafka, Ignite, and Spark Structured Streaming. The solution is open source and available on GitHub.
1) NVIDIA-Iguazio Accelerated Solutions for Deep Learning and Machine Learning (30 mins):
About the speaker:
Dr. Gabriel Noaje, Senior Solutions Architect, NVIDIA
http://bit.ly/GabrielNoaje
2) GPUs in Data Science Pipelines ( 30 mins)
- GPU as a Service for enterprise AI
- A short demo on the usage of GPUs for model training and model inferencing within a data science workflow
About the speaker:
Anant Gandhi, Solutions Engineer, Iguazio Singapore. https://www.linkedin.com/in/anant-gandhi-b5447614/
Reference architecture for Internet of ThingsSujee Maniyam
What kind of a data infrastructure is needed, to support Internet of Things?
This talk presents a reference architecture.
We are actually building this architecture as open source project. See here : bit.ly / iotxyz
This session is recommended for anyone interested in understanding how to use AWS big data services to develop real-time analytics applications. In this session, you will get an overview of a number of Amazon's big data and analytics services that enable you to build highly scaleable cloud applications that immediately and continuously analyze large sets of distributed data. We'll explain how services like Amazon Kinesis, EMR and Redshift can be used for data ingestion, processing and storage to enable real-time insights and analysis into customer, operational and machine generated data and log files. We'll explore system requirements, design considerations, and walk through a specific customer use case to illustrate the power of real-time insights on their business.
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...Henry Saputra
The Krylov Project is the key component in eBay's AI Platform initiative that provides an easy to use, open, and fast AI orchestration engine that is deployed as managed services in eBay cloud.
Using Krylov, AI scientists can access eBay's massive datasets; build and train AI models; spin up powerful compute (high-memory or GPU instances) on the Krylov compute cluster; and set up machine learning pipelines, such as using declarative constructs that stitch together pipeline lifecycle.
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Shirshanka Das
We discuss LinkedIn's big data ecosystem and its evolution through the years. We introduce three open source projects, Gobblin for ingestion, Cubert for computation and Pinot for fast OLAP serving. We also showcase our in-house data discovery and lineage portal WhereHows.
Streamlio and IoT analytics with Apache PulsarStreamlio
To keep up with fast-moving IoT data, you need technology that can collect, process and store data with performance and scalability. This presentation from Data Day Texas looks at the technology requirements and how Apache Pulsar can help to meet them.
Build Large-Scale Data Analytics and AI Pipeline Using RayDPDatabricks
A large-scale end-to-end data analytics and AI pipeline usually involves data processing frameworks such as Apache Spark for massive data preprocessing, and ML/DL frameworks for distributed training on the preprocessed data. A conventional approach is to use two separate clusters and glue multiple jobs. Other solutions include running deep learning frameworks in an Apache Spark cluster, or use workflow orchestrators like Kubeflow to stitch distributed programs. All these options have their own limitations. We introduce Ray as a single substrate for distributed data processing and machine learning. We also introduce RayDP which allows you to start an Apache Spark job on Ray in your python program and utilize Ray’s in-memory object store to efficiently exchange data between Apache Spark and other libraries. We will demonstrate how this makes building an end-to-end data analytics and AI pipeline simpler and more efficient.
Real Time Machine Learning Visualization With SparkChester Chen
Training machine learning model involves a lot of experimentation, we need a way to visualize the training process.
We presented a system to enable real time machine learning visualization with Spark:
-- Gives visibility into the training of a model
-- Allows us monitor the convergence of the algorithms during training
-- Can stop the iterations when convergence is good enough.
Apache Kafka Streams + Machine Learning / Deep LearningKai Wähner
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams...
Big Data and Machine Learning are key for innovation in many industries today. Large amounts of historical data are stored and analyzed in Hadoop, Spark or other clusters to find patterns and insights, e.g. for predictive maintenance, fraud detection or cross-selling.
This first part of the session explains how to build analytic models with R, Python and Scala leveraging open source machine learning / deep learning frameworks like Apache Spark, TensorFlow or H2O.ai. The second part discusses how to leverage these built analytic models in your own streaming applications or microservices; leveraging the Apache Kafka cluster and Kafka Streams instead of building an own stream processing cluster. The session focuses on live demos and teaches lessons learned for executing analytic models in a highly scalable and performant way.
The last part explains how Apache Kafka can help to move from a manual build and deployment of analytic models to continuous online model improvement in real time.
Complex Event Processing (CEP) for Next-Generation Security Event Management,...Tim Bass
Complex Event Processing (CEP) for Next-Generation Security Event Management, Fraud and Intrusion Detection , April 17, 2007 (First Draft), London, Tim Bass, CISSP, Director, Principal Global Architect
Emerging Technologies Group
Extending Spark Streaming to Support Complex Event ProcessingOh Chan Kwon
In this talk, we introduce the extensions of Spark Streaming to support (1) SQL-based query processing and (2) elastic-seamless resource allocation. First, we explain the methods of supporting window queries and query chains. As we know, last year, Grace Huang and Jerry Shao introduced the concept of “StreamSQL” that can process streaming data with SQL-like queries by adapting SparkSQL to Spark Streaming. However, we made advances in supporting complex event processing (CEP) based on their efforts. In detail, we implemented the sliding window concept to support a time-based streaming data processing at the SQL level. Here, to reduce the aggregation time of large windows, we generate an efficient query plan that computes the partial results by evaluating only the data entering or leaving the window and then gets the current result by merging the previous one and the partial ones. Next, to support query chains, we made the result of a query over streaming data be a table by adding the “insert into” query. That is, it allows us to apply stream queries to the results of other ones. Second, we explain the methods of allocating resources to streaming applications dynamically, which enable the applications to meet a given deadline. As the rate of incoming events varies over time, resources allocated to applications need to be adjusted for high resource utilization. However, the current Spark's resource allocation features are not suitable for streaming applications. That is, the resources allocated will not be freed when new data are arriving continuously to the streaming applications even though the quantity of the new ones is very small. In order to resolve the problem, we consider their resource utilization. If the utilization is low, we choose victim nodes to be killed. Then, we do not feed new data into the victims to prevent a useless recovery issuing when they are killed. Accordingly, we can scale-in/-out the resources seamlessly.
Complex Event Processing in Practice at jDays 2012Peter Norrhall
The increasing demand for real-time monitoring and decision making requires complex event processing (CEP) architectures, frameworks and tools. In this presentation Peter will introduce you to the concept CEP and in particular event stream analysis, the typical use cases, how it relates to event sourcing and implemented as event sourcing, a comparison of a couple of open source frameworks (Storm and Disruptor) and a comprehensive overview of Esper
Seminar about Semantic Complex Event Processing and Reaction RuleML presented at the School of Computer Science at McGill University on Sept. 9th, 2013 as part of the Transatlantic Business Process Management Education Network (http://bpmedu.net/) and presented at the DemAAL 2013 - Dem@Care Summer School on Ambient Assisted Living, 16-20 September 2013, Chania, Crete, Greece.
Event Driven Architecture (EDA), November 2, 2006Tim Bass
Event Driven Architecture (EDA), SOA Seminar Crystal City, Virginia, November 2nd, 2006, Tim Bass, CISSP, Principal Global Architect, Director. Co-Chair, Event Processing Reference Architecture Working Group (EPRAWG)
Semantic Complex Event Processing at Sem Tech 2010Adrian Paschke
Semantic Complex Event Processing - The Future of Dynamic IT
Presentation by Paul Vincent, Adrian Paschke, Harold Boley
at the RuleML Semantic Rules Track of the Semantic Technologies Conference 2010 (SemTech 2010), San Francisco, CA, USA
http://semtech2010.semanticuniverse.com/rules
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...WSO2
In this webinar, Sriskandarajah Suhothayan, technical lead at WSO2, will take a closer look at the following use cases:
Natural language processing capabilities of WSO2 CEP: Introducing basic constructs of the CEP
Analyzing a soccer game in Real time: Explaining how complicated scenarios can be implemented
Geo fencing capabilities of WSO2 CEP: Focusing on the CEP’s virtualization support
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormDataStax
Cassandra provides facilities to integrate with Hadoop. This is sufficient for distributed batch processing, but doesn’t address CEP distributed processing. This webinar will demonstrate use of Cassandra in Storm. Storm provides a data flow and processing layer that can be used to integrate Cassandra with other external persistences mechanisms (e.g. Elastic Search) or calculate dimensional counts for reporting and dashboards. We’ll dive into a sample Storm topology that reads and writes from Cassandra using storm-cassandra bolts.
It is an exciting time in computing with the sea-change happening both on the technology fronts and application fronts. Networked sensors and embedded platforms with significant computational capabilities with access to backend utility computing resources, offer a tremendous opportunity to realize large-scale cyber-physical systems (CPS) to address the many societal challenges including emergency response, disaster recovery, surveillance, and transportation. Referred to as Situation awareness applications, they are latency-sensitive, data intensive, involve heavy-duty processing, run 24x7, and result in actuation with possible retargeting of sensors. Examples include surveillance deploying large-scale distributed camera networks, and personalized traffic alerts in vehicular networks using road and traffic sensing. This talk covers ongoing research in Professor Ramachandran’s embedded pervasive lab to provide system support for Internet of Things.
Introduction of streaming data, difference between batch processing and stream processing, Research issues in streaming data processing, Performance evaluation metrics , tools for stream processing.
The world is moving from a model where data sits at rest, waiting for people to make requests of it, to where data is constantly moving and streams of data flow to and from devices with or without human interaction. Decisions need to be made based on these streams of data in real-time, models need to be updated, and intelligence needs to be gathered. In this context, our old-fashioned approach of CRUD REST APIs serving CRUD database calls just doesn't cut it. It's time we moved to a stream-centric view of the world.
https://jonthebeach.com/speakers/71/Markus+Eisele
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Event-first thinking and streaming help organizations transition from followers to leaders in the market. A reliable, scalable, and economical streaming architecture helps them get there.
This talk first explores the ""classic streaming stack,"" based on the Lambda architecture, its origin, and why it didn't pick up amongst data-driven organizations. The modern streaming stack (MSS) is a lean, cloud-native, and economical alternative to classic streaming architectures, where it aims to make event-driven real-time applications viable for organizations.
The second half of the talk explores the MSS in detail, including its core components, their purposes, and how Kappa architecture has influenced it. Moreover, the talk lays out a few considerations before planning a new streaming application within an organization. The talk concludes by discussing the challenges in the streaming world and how vendors are trying to overcome them in the future.
A presentation on the Netflix Cloud Architecture and NetflixOSS open source. For the All Things Open 2015 conference in Raleigh 2015/10/19. #ATO2015 #NetflixOSS
Distributed Data Processing for Real-time ApplicationsScyllaDB
What are the elements of a modern distributed application architecture? What are the fundamentals and programming patterns of event processing? What’s a data mesh? Is it the best way to propagate state across distributed systems? Discover the answers to these questions and more from distributed systems expert Maheedhar Gunturu.
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
Cybersecurity requires an organization to collect data, analyze it, and alert on cyber anomalies in near real-time. This is a challenging endeavor when considering the variety of data sources which need to be collected and analyzed. Everything from application logs, network events, authentications systems, IOT devices, business events, cloud service logs, and more need to be taken into consideration. In addition, multiple data formats need to be transformed and conformed to be understood by both humans and ML/AI algorithms.
To solve this problem, the Aetna Global Security team developed the Unified Data Platform based on Apache NiFi, which allows them to remain agile and adapt to new security threats and the onboarding of new technologies in the Aetna environment. The platform currently has over 60 different data flows with 95% doing real-time ETL and handles over 20 billion events per day. In this session learn from Aetna’s experience building an edge to AI high-speed data pipeline with Apache NiFi.
High Availability HPC ~ Microservice Architectures for Supercomputinginside-BigData.com
In this deck from the Stanford HPC Conference, Ryan Quick from Providentia Worldwide presents: High Availability HPC ~ Microservice Architectures for Supercomputing.
"Microservices power cloud-native applications to scale thousands of times larger than single deployments. We introduce the notion of microservices for traditional HPC workloads. We will describe microservices generally, highlighting some of the more popular and large-scale applications. Then we examine similarities between large-scale cloud configurations and HPC environments. Finally we propose a microservice application for solving a traditional HPC problem, illustrating improved time-to-market and workload resiliency."
Watch the video: https://insidehpc.com/2018/02/high-availability-hpc-microservice-architectures-supercomputing/
Learn more: http://www.providentiaworldwide.com/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The adoption of container native and cloud native development practices presents new operational challenges. Today’s microservice environments are polyglot, distributed, container-based, highly-scalable, and ephemeral. To understand your system, you need to be able to follow the life of a request across numerous components distributed in multiple environments. Without the proper tools it can feel impossible to determine a root cause of an issue. This requires a new approach to operations. We will review a series of open source observability tools for logging, monitoring, and tracing to help developers achieve operational excellence for running container-based workloads.
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
Think you have big data? What about high availability
requirements? At DataDog we process billions of data points every day including metrics and events, as we help the world
monitor the their applications and infrastructure. Being the world’s monitoring system is a big responsibility, and thanks to
Redis we are up to the task. Join us as we discuss how the DataDog team monitors and scales Redis to power our SaaS based monitoring offering. We will discuss our usage and deployment patterns, as well as dive into monitoring best practices for production Redis workloads
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scaleDataScienceConferenc1
Rivian makes adventurous electric vehicles with a mission of a sustainable planet and keeping the world adventurous forever. Rivian's vehicles are born in the cloud and embody tenets of a software defined vehicle, where not only the user accessible features such as infotainment are software driven and updated, but also internals aspects such as vehicle dynamics. Real-time instrumentation and telemetry are the key underpinnings that make all this possible. Rivian has built a cutting-edge Real-time stack using a combination of open-source technologies like Kafka, Flink and Druid and in house services. This talk will go into how these are combined and leveraged to deliver real-time analytics.
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
3 Things to Learn About:
*Building scalable real time architectures for managing data from IoT
*Processing data in real time with components such as Kudu & Spark
*Customer case studies highlighting real-time IoT use cases
The latest distributed system utilizing the cloud is a very complicated configuration in which the components span a plurality of components. Applications for customers are part of products, and service quality targets directly linked to business indicators are needed. Legacy monitoring system based on traditional system management is not linked not only to business indicators but also to measure service quality. Google advocates the idea of site reliability engineering (SRE) and introduces efforts to measure quality of service. Based on the concept of SRE, the service quality monitoring system collects and analyzes logs from various components not only application codes but also whole infrastructure components. Since very large amounts of data must be processed in real time, it is necessary to design carefully with reference to the big data architecture. To utilize this system, you can measure the quality of service, and make it possible to continuously improve the service quality.
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
Building a Real-Time Security Application Using Log Data and Machine Learning- Karthik Aaravabhoomi
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Similar to Siddhi: A Second Look at Complex Event Processing Implementations (20)
Book: Software Architecture and Decision-MakingSrinath Perera
Uncertainty is the leading cause of mistakes made by practicing software architects. The primary goal of architecture is to handle uncertainty arising from user cases as well as architectural techniques. The book discusses how to make architectural decisions and manage uncertainty. From the book, You will learn common problems while designing a system, a default solution for each, more complex alternatives, and 5Q & 7P (Five Questions and Seven Principles) that help you choose.
Book, https://amzn.to/3v1MfZX
Blog: http://tinyurl.com/swdmblog
Six min video - https://youtu.be/jtnuHvPWlYU
We have critically evaluated how AI will shape integration use cases, their feasibility, and timelines. Emerging Technology Analysis Canvas (ETAC), a framework built to analyze emerging technologies, is the methodology of our study.
We observe that AI can significantly impact integration use cases and identify 13 AI-based use case classes for integration. Points to note include:
Enabling AI in an enterprise involves collecting, cleaning up, and creating a single representation of data as well as enforcing decisions and exposing data outside, each of which leads to many integration use cases. Hence, AI indirectly creates demand for integration.
AI needs data, which in some cases lead to significant competitive advantages. The need to collect data would drive vendors to offer most AI products in the cloud through APIs.
Due to lack of expertise and data, custom AI model building will be limited to large organizations. It is hard for small and medium size organization to build and maintain custom models.
The Role of Blockchain in Future IntegrationsSrinath Perera
We have critically evaluated blockchain-based integration use cases, their feasibility, and timelines. Emerging Technology Analysis Canvas (ETAC), a framework built to analyze emerging technologies, is the methodology of our study. Based on our analysis, we observe that blockchain can significantly impact integration use cases.
In our paper, we identify 30-plus blockchain-based use cases for integration and four architecture patterns. Notably, each use case we identified can be implemented using one of the architecture patterns. Furthermore, we also discuss challenges and risks posed by blockchains that would affect these architecture patterns.
Our webinar presents a critical analysis of serverless technology and our thoughts about its future. We use Emerging Technology Analysis Canvas (ETAC), a framework built to analyze emerging technologies, as the methodology of our study. Based on our analysis, we believe that serverless can significantly impact applications and software development workflows.
We’ve also made two further observations:
Limitations, such as tail latencies and cold starts, are not deal breakers for adoption. There are significant use cases that can work with existing serverless technologies despite these limitations.
We see a significant gap in required tooling and IDE support, best practices, and architecture blueprints. With proper tooling, it is possible to train existing enterprise developers to program with serverless. If proper tools are forthcoming, we believe serverless can cross the chasm in 3-5 years.
A detailed analysis can be found here: A Survey of Serverless: Status Quo and Future Directions. Join our webinar as we discuss this study, our conclusions, and evidence in detail.
1. Blockchain potential impact is real. If successful, Blockchain technologies can transform the way we live our day to day lives.
2. We believe technology is ready for limited applications in Digital Currency, Lightweight financial systems, Ledgers (of identity, ownership, status, and authority), Provenance (e.g. supply chains and other B2B scenarios) and Disintermediation, which we believe will happen in next three years.
3. However, with other use cases, blockchain faces significant challenges such as performance, irrevocability, need for regulation and lack of census mechanisms. These are hard problems and
4. It is not clear whether blockchain can sustain the current level of effort for extended period of 5+ years. There are many startups and they run the risk of running out of money before markets are ready. Failure of startups can inhibit further funding and investments.
5. Value and need of decentralization compared to centralized and semi-centralized alternatives is not clear.
A Visual Canvas for Judging New TechnologiesSrinath Perera
In the fast-changing technology world, the technology landscape shifts faster and faster. The agents of thses changes are new emerging technologies, which sometimes even create, destroy, or transform segments. In a shifting world, prevailing advantages are fleeting. Organizations that can master change and ride technology waves owns the future.
Not all emerging technologies live up to their promise. Every year, as a part of annual planning, most organizations need to decide relevance, impact, and the probability of success of emerging technologies and pick their bets. Although it is a regular decision there is no widely accepted framework for evaluating emerging technologies.
As a solution to this problem, we present “Emerging Technology Analysis Canvas” (ETAC), a framework to assess an individual emerging technology as a solution to this problem. Inspired by the Business Model Canvas, It represents different aspects of technology visually on a single page. This approach includes a set of questions that probe the technology arranged around a logical narrative. The visual representation is concise, compact, and comprehensible in a glance.
The talk discusses how analytics can attack privacy and what we can do about it. It discusses the legal responses (e.g. GDPR) as well technical responses ( differential privacy and homomorphic encryption).
The video is in https://www.facebook.com/eduscopelive/videos/314847475765297/ from 1.18.
Blockchain is often cited as one of the most impactful technology along with AI. It has attracted many startups, venture investments, and academic research. If successful, Blockchain technologies can transform the way, we live our day to day lives.
However, blockchain faces significant challenges such as performance, irrevocability, need for regulation and lack of census mechanisms. They are hard problems, and likely it will take at least 5-10 years to find answers to those problems.
Given the risk involved as well as the significant potential returns, we recommend a cautiously optimistic approach for blockchain with the focus on concrete use cases.
Today's Technology and Emerging Technology LandscapeSrinath Perera
We have seen the rise and fall of many technologies, some disappearing without a trace while others redefining the world. Collectively they have shaped our world beyond recognition. In this talk, Srinath will start with past technologies exploring their behavior. Then he will explore current middleware landscape, its composition, and relationships between different segments. He will discuss significant developments and discuss their future. Further, he will discuss emerging technologies, forces that shape them, and the promise of each technology, and finally, speculate about their evolution. You will walk away with knowledge on the evolution of middleware, the status quo, and discussion about how, at WSO2, we think those technologies will evolve.
Some died, some get by, but some have woven themselves to today's middleware so much that we do not notice them. The point I want to make is that not all emerging technologies are fads. Some are, and some are too early, like AI. But some are lasting.
The Rise of Streaming SQL and Evolution of Streaming ApplicationsSrinath Perera
First-generation stream processors, such as Apache Storm, wanted us to write code. It was a great start. However, when building real-world apps, which are used for a long time and evolve, writing code gets us into trouble.
If we want to query a database or query data stored in storage with Hadoop, we use SQL. Why can't we query data streaming using SQL? We can. Almost all open source stream processors, including Storm, Flink, and Kafka, have switched to SQL.
In this webinar, Srinath will talk about the evolution of stream processing, streaming SQL, the status quo, and what this means to stream applications. He will also dissect the experience of building streaming applications by exploring common patterns and pitfalls.
Analytics and AI: The Good, the Bad and the UglySrinath Perera
Analytics let us question the data, which in effect questions the world around us. This let us understand, monitor, and shape the world. AI let us discover connections, predict the possible futures and automate tasks.
These twin technologies can change the world around us. On one hand, make us efficient, connected, and fulfilled. At the same time, the change of status quo can replace jobs, affect lives and build biases into our systems that can marginalize millions.
In this talk, we will discuss core ideas behind analytics and AI, their possible impact, both good and bad outcomes, and challenges.
The dawn of digital businesses is upon us, with reimagined business models that make the best use of digital technologies such as automation, analytics, integration and cloud. Digital businesses are efficient, continuously optimizing, proactive, flexible and are able to fully understand their customers. Analytics is a key technology that helps in doing so. It acts as the eyes and ears of the system and provides a holistic view on the past and present so that decision-makers can predict what will happen in the future. This webinar will explore
Why becoming a digital business is not a choice
The role of analytics in digital transformation with examples
How best to leverage state of the art analytics technology
SoC Keynote:The State of the Art in Integration TechnologySrinath Perera
This talk discusses Outline of the state of the art of Enterprise Software and how we get there, as I see it. Also second part describes Ballerina, a new programming language WSO2 has built for Enterprise Computing.
It is presented as a Keynote at 11th Symposium and Summer School On Service-Oriented Computing.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
"Impact of front-end architecture on development cost", Viktor Turskyi
Siddhi: A Second Look at Complex Event Processing Implementations
1. A SECOND LOOK AT COMPLEX
EVENT PROCESSING
Srinath Perera,
Senior Software Architect WSO2 Inc.
Visiting Faculty, University of Moratuwa
Member, Apache Software Foundation
http://siddhi.sourceforge.net/
2. • Introduction to CEP
• Second look at CEP
Implementations
• Siddhi Architecture
• Siddhi Performance
• Conclusion and Future
Topics
photo by John Trainoron Flickr
http://www.flickr.com/photos/trainor/2902023575/, Licensed under CC
3. Complex Event Processing (CEP) is identifying meaningful
patterns, relationships and data abstractions among unrelated events
and fire an immediate response.
Database Applications Event-driven Applications
Query
Ad-hoc queries or requests Continuous standing queries
Paradigm
Latency Seconds, hours, days Milliseconds or less
Tens of thousands of
Data Rate Hundreds of events/sec
events/sec or more
Event output
request
stream
input
response stream
4. Months
CEP Target Scenarios
Days
Relational Database
hours Applications Operational Analytics
Applications
Latency
Minutes Data Warehousing
Applications Web Analytics
Seconds Applications
Monitoring Manufacturing
100 ms Financial Trading
Applications Applications
Applications
< 1ms
0 10 100 1000 10000 100000 ~1M
Aggregate Data Rate (Events/seconds)
5. • E-Science often deals with National and Global scale usecases
while we try to understand the world around us better.
• Following is an important class of E-science applications
Receives data about the world around us from sensors deployed across the
country/world
Try to make sense, react to, predict, and/or control the world around us
Examples: Weather, Traffic Data, Surveillance, Smart Grid etc..
• CEP is a powerful enabling technologies for these usecases
http://www.flickr.com/photos/imuttoo/4257813689/ by Ian Muttoo and
http://www.flickr.com/photos/patdavid/4619331472/ by Pat David copyright CC
6. • Each Event consists of properties (name value pairs)
• We separate different events to streams
• Use a SQL like query language, but queries are
evaluated on continuous event streams
• Types of queries are following
Selection (filtering) and projection (e.g. like select in SQL)
Windows – events are processed within a window (e.g. for
aggregation and joins). Time window vs. length window.
Ordering – sequences and patterns (before, followed by
conditions e.g. new location followed by small and a large
purchase might suggest a fraud)
Join and split
Aggregation
7. EmirateDeal= Select * from Emirate where prize < 600
UnitedDeal = Select * from United where prize < 600
PotentialDeal = Select * from EmirateDeal join UnitedDeal
where UnitedDeal .dest=EmirateDeal .dest
8.
9.
10. • A very good reference: “G. Cugola and A. Margara.
Processing flows of information: From data stream to
complex event” processing. ACM Computing Surveys, 2011.
• Initial work from Databases, active databases. But the “store
and process” model was too slow.
• Stream processing uses a pub/sub model, very scalable but
lacks temporal support
(Aurora [5, 2], PIPES [23, 7], STREAM [30], Borealis [6,1] S4
[25], and Storm.)
• CEP engines
Mainly uses a NFA based model
SASE (data flow and NFA)
Esper – NFA and Delta networks
Cayuga – NFA and re-subscriptions (single thread)
11. • Borrowing ideas from Stream processing: Pipeline
based architecture with a pub/sub model inspired by
Stream processing.
• Breaking single thread evaluations: Improving
parallelism in processing through a pipeline based
model
• A New State machine implementation that avoids
checking a series of conditions
• Improved support for query chaining with complex
queries
12.
13.
14. • Uses a pipeline architecture
• Consist of a pipeline of processors (support all
other operators except windows)
• Processors are connected by queues (they
implement windows)
• Processors take inputs from a queue by
subscribing to them and placing results in a
output queue
• Many threads are assigned to processors, where
they take data from input queues, process
them, and place them in output queues.
15.
16. EmirateDeal= Select * from Emirate where prize < 600
UnitedDeal = Select * from United where prize < 600
PotentialDeal = Select * from EmirateDeal join UnitedDeal
where UnitedDeal .dest=EmirateDeal .dest
17. • Evaluate the tree, and optimizations to stop the
evaluations as soon as possible
18. • Implemented as a
queue
• Use events to notify the
windows event addition
and removals using a
pub/sub pattern to
processors
19. • Siddhi supports avg, sum, count, max, min
• These are implemented within processor and each addition or removal of
a event updates the value (if it is a window based, queues notify
subscribers when thing change).
20. • Implemented as a processor that moves data from several
queues to a different queue after joining
• When a event arrives, if it matches the join condition, we
send them over to the next queue, else keeps them to
match with the future events.
• We keep them as long as the window condition allows
(this is done by keeping references to queues, and
checking within them for matching )
21. • Patterns and sequences handle series of events like A-
>B->C, there are two classes patterns and sequences
• Patterns matches with events ignoring events in
between while sequences matches exactly. You can use
star (*) with sequences to mimic patterns
• Supported through a state machine (NFA)
22. • Use a event model
• Say we need A->B->C, then we break down it to list of
listeners, which when triggered remove itself and
registers the next in the list.
• We apply different variations of this to support *, and
sequences .
23. • We compared Siddhi with Esper, the widely used
opensource CEP engine
• For evaluation, we did setup different queries using both
systems, push events in to the system, and measure the
time till all of them are processed.
• We used Intel(R) Xeon(R) X3440 @2.53GHz , 4 cores 8M
cache 8GB RAM running Debian 2.6.32-5-amd64 Kernel
27. • Following describe Siddhi in contrast to Esper using
the comparisons defined in Cugola et al.
• In terms of Functional and Processing Models and
Data, Time, and Rule Models (see Table I and III of
Cugola et al.), Esper and Siddhi behave the same.
• In terms of Supported language model (table IV),
Esper supports Pane and Tumble windows, User Defined
operators and Parameterization while Siddhi does not
support them
Siddhi supports removal of duplicates that is not supported
by Siddhi.
• Both support all other language constructs.
28. • Improvements in the Selection Operator
using MVEL
• Support conditions on the aggregation functions
having, group-by
• Event quantifications in sequential patterns
kleene closure (*)
• Unique function
29. • "Los Angeles Smart Grid Demonstration Project"
It forecasts electricity demand, respond to peak load
events, and improves sustainable use of energy by
consumers.
http://ceng.usc.edu/~simmhan/pubs/simmhan-
usctr2011-smartgridinformatics.pdf
• Open MRS NCD module – idea is to detect and
notify patient when certain conditions have
occurredhttps://wiki.openmrs.org/display/docs/N
otifiable+Condition+Detector+%28NCD%29+Mo
dule
• WSO2 CEP Server (Perspective)
30. • SQL like query language for Siddhi
• Building scalable processing
network using Siddhi nodes
Siddhi is currently a java library, build a
Thrift based server for Siddhi
Supporting a network of Siddhi
processors (clustering)
Query partition and optimization for scale
• Support for Pane and Tumble
windows, User Defined
operators and Parameterization
• Available under Apache License
http://www.flickr.com/photos/garryknight/3650151941/
31. • Have resource allocations
and development efforts to
continue next year
• Looking forward to build a
opensource community
around it
• Looking for usecases, real
data feeds, and of course
Guinea pigs