The presentation of my talk at WU Vienna on 18/2/2016. I discuss the problem of unifying existing solutions to process semantic streams - with a particular focus on the ones that perform continuous query answering over RDF streams
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...Ververica
Stream Processing is emerging as a popular paradigm for data processing architectures, because it handles the continuous nature of most data and computation and gets rid of artificial boundaries and delays.
The fact that stream processing is gaining rapid adoption is also due to more powerful and maturing technology (much of it open source at the ASF) that has solved many of the hard technical challenges.
We discuss Apache Flink's approach to high performance stream processing with state, strong consistency, low latency, and sophisticated handling of time. With such building blocks, Apache Flink can handle classes of problems previously considered out of reach for stream processing. We also take a sneak preview at the next steps for Flink.
Days In Green (DIG): Forecasting the life of a healthy serviceArun Kejariwal
This document describes Twitter's Days In Green (DIG) methodology for forecasting the lifespan of a healthy service before it exceeds a predefined capacity threshold. It involves collecting time series data on a service's key performance metric, detecting anomalies and breakouts, fitting an ARIMA model to capture trends and seasonality, and forecasting the number of days before the threshold is breached to determine capacity needs. The methodology has been deployed at Twitter to help plan capacity for hundreds of services and detect those nearing disaster recovery thresholds.
This document discusses tools and methodologies for analyzing and resolving questions about why certain fire units were or were not dispatched to emergency responses. It provides examples of common questions, such as why an address was incorrect or why a closer unit was not chosen, and the steps and analyses that can be used to determine the root causes and answer the questions. These include checking unit locations, response times, road networks and speeds, and identifying potential issues with AVL systems, call entry processes, or CAD recommendations.
Ontology based top-k query answering over massive, heterogeneous, and dynamic...Daniele Dell'Aglio
This document discusses ontology-based top-k continuous query answering over streaming data from multiple heterogeneous sources. It aims to investigate how ontologies and top-k queries can improve continuous query processing by exploiting ordering. The research will analyze state of the art solutions, define an evaluation framework, and assess the effects on correctness and performance of techniques that integrate stream reasoning and top-k queries. Preliminary results include an extension of an RDF stream processor testbench and a case study on real-time social media analytics.
Augmented Participation to Live Events through Social Network Content Enrichm...Daniele Dell'Aglio
The document describes ECSTASYS, a system that captures social media content related to live events and enriches it to provide more context and value for event attendees. ECSTASYS retrieves tweets about an event, filters irrelevant ones, identifies event-related entities, associates tweets with specific event sub-topics, and visualizes the information organized by event. It uses a knowledge base derived from event schedules and ontologies to link tweets to the correct event components to provide a more holistic view of the complex live event through social media.
This document reports the results of unit root tests on several time series variables. It presents the Augmented Dickey-Fuller (ADF) test statistic for each variable and lags of each variable, and compares these to critical values at various significance levels to test the null hypothesis of each variable having a unit root. The tests were run on variables including GM1, GM2, GMB, GM1ISL, GM2ISL, GMBISL, GCPI, GCREDIT, GLIKUID, GCREDIT ISL, and GLIKUID ISL.
XSPARQL is a query language that allows querying of both XML and RDF data sources simultaneously. It extends the syntax of XQuery with a SPARQL-for clause to query RDF data and a CONSTRUCT clause to produce RDF output. XSPARQL 1.1 supports SPARQL 1.1 operators like aggregation, federation, negation and property paths. It also allows processing of JSON files. The XSPARQL evaluator takes an XSPARQL query, rewrites it, optimizes it, and executes it using XQuery and SPARQL engines to retrieve and combine data from different sources into a unified XML or RDF answer.
Our environment consists of living and non-living components that interact in complex ways. Humans rely on healthy ecosystems, but our activities have disrupted natural cycles and caused pollution. Key issues include climate change from greenhouse gas emissions, which risks a runaway warming effect. While scientists agree human activity contributes to current warming trends, fully predicting climate impacts remains challenging due to its complexity. Maintaining sustainable resource use requires awareness of our footprint on ecological systems.
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...Ververica
Stream Processing is emerging as a popular paradigm for data processing architectures, because it handles the continuous nature of most data and computation and gets rid of artificial boundaries and delays.
The fact that stream processing is gaining rapid adoption is also due to more powerful and maturing technology (much of it open source at the ASF) that has solved many of the hard technical challenges.
We discuss Apache Flink's approach to high performance stream processing with state, strong consistency, low latency, and sophisticated handling of time. With such building blocks, Apache Flink can handle classes of problems previously considered out of reach for stream processing. We also take a sneak preview at the next steps for Flink.
Days In Green (DIG): Forecasting the life of a healthy serviceArun Kejariwal
This document describes Twitter's Days In Green (DIG) methodology for forecasting the lifespan of a healthy service before it exceeds a predefined capacity threshold. It involves collecting time series data on a service's key performance metric, detecting anomalies and breakouts, fitting an ARIMA model to capture trends and seasonality, and forecasting the number of days before the threshold is breached to determine capacity needs. The methodology has been deployed at Twitter to help plan capacity for hundreds of services and detect those nearing disaster recovery thresholds.
This document discusses tools and methodologies for analyzing and resolving questions about why certain fire units were or were not dispatched to emergency responses. It provides examples of common questions, such as why an address was incorrect or why a closer unit was not chosen, and the steps and analyses that can be used to determine the root causes and answer the questions. These include checking unit locations, response times, road networks and speeds, and identifying potential issues with AVL systems, call entry processes, or CAD recommendations.
Ontology based top-k query answering over massive, heterogeneous, and dynamic...Daniele Dell'Aglio
This document discusses ontology-based top-k continuous query answering over streaming data from multiple heterogeneous sources. It aims to investigate how ontologies and top-k queries can improve continuous query processing by exploiting ordering. The research will analyze state of the art solutions, define an evaluation framework, and assess the effects on correctness and performance of techniques that integrate stream reasoning and top-k queries. Preliminary results include an extension of an RDF stream processor testbench and a case study on real-time social media analytics.
Augmented Participation to Live Events through Social Network Content Enrichm...Daniele Dell'Aglio
The document describes ECSTASYS, a system that captures social media content related to live events and enriches it to provide more context and value for event attendees. ECSTASYS retrieves tweets about an event, filters irrelevant ones, identifies event-related entities, associates tweets with specific event sub-topics, and visualizes the information organized by event. It uses a knowledge base derived from event schedules and ontologies to link tweets to the correct event components to provide a more holistic view of the complex live event through social media.
This document reports the results of unit root tests on several time series variables. It presents the Augmented Dickey-Fuller (ADF) test statistic for each variable and lags of each variable, and compares these to critical values at various significance levels to test the null hypothesis of each variable having a unit root. The tests were run on variables including GM1, GM2, GMB, GM1ISL, GM2ISL, GMBISL, GCPI, GCREDIT, GLIKUID, GCREDIT ISL, and GLIKUID ISL.
XSPARQL is a query language that allows querying of both XML and RDF data sources simultaneously. It extends the syntax of XQuery with a SPARQL-for clause to query RDF data and a CONSTRUCT clause to produce RDF output. XSPARQL 1.1 supports SPARQL 1.1 operators like aggregation, federation, negation and property paths. It also allows processing of JSON files. The XSPARQL evaluator takes an XSPARQL query, rewrites it, optimizes it, and executes it using XQuery and SPARQL engines to retrieve and combine data from different sources into a unified XML or RDF answer.
Our environment consists of living and non-living components that interact in complex ways. Humans rely on healthy ecosystems, but our activities have disrupted natural cycles and caused pollution. Key issues include climate change from greenhouse gas emissions, which risks a runaway warming effect. While scientists agree human activity contributes to current warming trends, fully predicting climate impacts remains challenging due to its complexity. Maintaining sustainable resource use requires awareness of our footprint on ecological systems.
The document discusses logging frameworks and describes key concepts such as declaration and naming of loggers, different logging levels, appenders for output, and layouts. It uses Log4j as an example logging framework and shows how to configure loggers, levels, and appenders in the properties file. Code examples are provided to illustrate logger declaration and usage.
This document discusses an empirical study of RDF stream processing systems. The study aimed to understand why different systems can produce different outputs for the same inputs. Through experiments, the study found that differences could be explained by parameters like the starting time (t0) of windows in continuous queries. A more detailed model called SECRET was then developed to describe stream processing and help predict system outputs. This led to the CSR-bench benchmark for evaluating and comparing RDF stream reasoning systems.
Galaxy Blimps LLC was established in 2000 and has expanded its operations since 2006 through various milestones and certifications. The company provides aerial broadcasting, movie and TV production, aerial surveillance, military, university research, and other applications through its fleet of blimps like the HD60 and HD75 blimps. Galaxy Blimps sees future potential for unmanned airships in heavy cargo lift, high altitude radar platforms, atmospheric research, and communications relay.
This document discusses correctness in benchmarking RDF stream processors. It proposes a common model for the operational semantics of these systems called CSR and an extension to an existing benchmark called CSR-bench that focuses on correctness. CSR-bench includes an oracle to automatically validate correctness and a test suite. Experiments with three systems showed incorrect behaviors related to window initialization, slide parameters, window contents and timestamps. The work aims to improve understanding and assessment of these systems through a shared test environment.
This document provides an overview of multi-viewing screen installations and includes case studies of corporate boardrooms, family rooms, and dual-purpose installations. It discusses the benefits of designing multi-purpose rooms with increased functionality using a single space instead of two and the appeal of unique designs. Reasons for commercial installations include simultaneous screens for video conferencing, data, webinars and news monitoring as well as options for single-screen presentations based on group size and monitoring markets and news from executive offices.
Attachment C Company Overview Catalog Of ServicesGalaxyBlimps
Galaxy Blimps LLC is a provider of unmanned aircraft systems for defense, civil, and commercial applications. They have over 40 years of combined experience in various unmanned platforms. Their services include pre-program planning, flight services, design and engineering, production, system documentation, training, payload integration, best practices, and staffing assistance. They have experience supporting defense programs and agencies like DHS, DoJ, and FBI.
Our environment consists of living and non-living components that interact in complex ways. Humans rely on healthy ecosystems, but our activities have disrupted natural cycles and caused pollution. Key issues include climate change from greenhouse gas emissions, which risks a runaway warming effect. While scientists agree human activity contributes to current warming trends, fully predicting climate impacts remains challenging given its complexity. Maintaining sustainable resource use requires understanding our role within natural systems.
The document discusses augmented Dickey-Fuller (ADF) tests for detecting unit roots in time series data. It presents the three possible forms of the ADF test regression and describes how to determine the appropriate model. A procedure is outlined for selecting between models and testing whether time series contain deterministic trends, constants, or unit roots. The document also provides instructions for performing ADF tests in Eviews software, including specifying the test regression, lag length, and interpreting the test results.
The document discusses revision control systems and their main concepts and operations. It describes how revision control allows for backup of files, sharing of work, and cooperative development. The key operations covered are checkout, commit, update, and revert. It also discusses branches, tags, and distributed version control systems.
The document discusses unit testing and the JUnit framework. It defines unit testing as testing individual units or modules of code in isolation to determine if they work as expected. JUnit is introduced as a unit testing framework for Java. Key concepts covered include test cases, test fixtures, test suites, annotations for setup and teardown like @Before and @After, and best practices for test-driven development. Examples are provided of writing test cases using JUnit to test a TreeNode class and its methods.
This document discusses solid waste management issues in India. It notes that rapid urbanization, neglect by authorities, and public apathy have led to a garbage crisis. To address this, authorities must implement proper waste management systems as per regulations by treating waste via composting, anaerobic digestion, or other technologies. The document outlines several waste treatment options and recommends that vermicomposting is suitable for individual homes, composting is best for medium capacities, and anaerobic digestion is appropriate for large volumes of waste. Effective waste management requires proper collection, transportation, treatment, disposal and public awareness.
Brief report about the contents of the Stream Reasoning workshop at SIWC 2016. Additional info about the event are available at: http://streamreasoning.org/events/sr2016
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...Daniele Dell'Aglio
This document proposes a query model called RSEP-QL to capture event pattern matching in RDF stream processing languages. It presents RSEP-QL's data model of RDF streams and windows, basic operators like EVENT and SEQ, and evaluation semantics. The goal is to provide a reference model for comparing different RSP query languages and studying related problems in a standardized way.
The talk I gave at the Stream Reasoning workshop in TU Berlin on December 8. I give an overview of RSEP-QL and how it can capture and formalise the behaviour of existing RSP engines, e.g. CSPARQL, EP-SPARQL, CQELS, SPARQLstream
Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...Riccardo Tommasini
This master's thesis document describes the Heaven framework for enabling systematic comparative research of RDF stream processing engines. The research question is whether a test stand, using existing queries, datasets and metrics, can support such comparative research. The author developed Heaven as an open source test stand and used it to evaluate four baseline RDF stream processing engines. Heaven provides extensible components, methods for layered analysis, and tools to identify patterns and enable visual comparison of engine performance. The thesis concludes Heaven supports comparative research of RDF stream processing engines and identifies opportunities for further developing Heaven and researching additional engines.
SDRule-L: Managing Semantically Rich Business Decision ProcessesChristophe Debruyne
Semantic Decision Rule Language (SDRule-L) is an extension to Object-Role Modelling language (ORM), which is one of the most popular fact based, graphical modelling languages for designing information systems. In this paper, we want to discuss how SDRule-L models can be formalized, analysed and applied in a business context. An SDRule-L model may contain static (e.g., data constraints) and dynamic rules (e.g., sequence of events). A reasoning engine is created for detecting inconsistency. When an SDRule-L model is used to manage linked data, a feasible way is to align SDRule-L with Semantic Web languages, e.g. OWL. In order to achieve this, we propose to map dynamic rules into a combination of static rules and queries for detecting anomalies. In this paper, we will illustrate a model reification algorithm for automatically transforming SDRule-L models that contain dynamic rules into the ones containing static rules, which can be formalized in Description Logic.
Published as: Yan Tang Demey, Christophe Debruyne: SDRule-L: Managing Semantically Rich Business Decision Processes. EC-Web 2013: 59-67
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
Apache Beam is Flink’s sibling in the Apache family of streaming processing frameworks. The Beam and Flink teams work closely together on advancing what is possible in streaming processing, including Streaming SQL extensions and code interoperability on both platforms.
Beam was originally developed at Google as the amalgamation of its internal batch and streaming frameworks to power the exabyte-scale data processing for Gmail, YouTube and Ads. It now powers a fully-managed, serverless service Google Cloud Dataflow, as well as is available to run in other Public Clouds and on-premises when deployed in portability mode on Apache Flink, Spark, Samza and other runners. Users regularly run distributed data processing jobs on Beam spanning tens of thousands of CPU cores and processing millions of events per second.
In this session, Sergei Sokolenko, Cloud Dataflow product manager, and Reuven Lax, the founding member of the Dataflow and Beam team, will share Google’s learnings from building and operating a global streaming processing infrastructure shared by thousands of customers, including:
safe deployment to dozens of geographic locations,
resource autoscaling to minimize processing costs,
separating compute and state storage for better scaling behavior,
dynamic work rebalancing of work items away from overutilized worker nodes,
offering a throughput-optimized batch processing capability with the same API as streaming,
grouping and joining of 100s of Terabytes in a hybrid in-memory/on-desk file system,
integrating with the Google Cloud security ecosystem, and other lessons.
Customers benefit from these advances through faster execution of jobs, resource savings, and a fully managed data processing environment that runs in the Cloud and removes the need to manage infrastructure.
Foundations of streaming SQL: stream & table theoryDataWorks Summit
The document provides an overview of streaming SQL and time-varying relations. It discusses:
1) How relations evolve over time in streaming SQL, with data divided into time intervals. This allows querying the relation at any point in time.
2) The closure properties of relational algebra still apply to time-varying relations. Operations like filtering and grouping can be performed on intervals of the relation.
3) Streaming SQL extends classic SQL to handle continuous queries over streaming data, represented as time-varying relations divided into time-based intervals.
The document discusses logging frameworks and describes key concepts such as declaration and naming of loggers, different logging levels, appenders for output, and layouts. It uses Log4j as an example logging framework and shows how to configure loggers, levels, and appenders in the properties file. Code examples are provided to illustrate logger declaration and usage.
This document discusses an empirical study of RDF stream processing systems. The study aimed to understand why different systems can produce different outputs for the same inputs. Through experiments, the study found that differences could be explained by parameters like the starting time (t0) of windows in continuous queries. A more detailed model called SECRET was then developed to describe stream processing and help predict system outputs. This led to the CSR-bench benchmark for evaluating and comparing RDF stream reasoning systems.
Galaxy Blimps LLC was established in 2000 and has expanded its operations since 2006 through various milestones and certifications. The company provides aerial broadcasting, movie and TV production, aerial surveillance, military, university research, and other applications through its fleet of blimps like the HD60 and HD75 blimps. Galaxy Blimps sees future potential for unmanned airships in heavy cargo lift, high altitude radar platforms, atmospheric research, and communications relay.
This document discusses correctness in benchmarking RDF stream processors. It proposes a common model for the operational semantics of these systems called CSR and an extension to an existing benchmark called CSR-bench that focuses on correctness. CSR-bench includes an oracle to automatically validate correctness and a test suite. Experiments with three systems showed incorrect behaviors related to window initialization, slide parameters, window contents and timestamps. The work aims to improve understanding and assessment of these systems through a shared test environment.
This document provides an overview of multi-viewing screen installations and includes case studies of corporate boardrooms, family rooms, and dual-purpose installations. It discusses the benefits of designing multi-purpose rooms with increased functionality using a single space instead of two and the appeal of unique designs. Reasons for commercial installations include simultaneous screens for video conferencing, data, webinars and news monitoring as well as options for single-screen presentations based on group size and monitoring markets and news from executive offices.
Attachment C Company Overview Catalog Of ServicesGalaxyBlimps
Galaxy Blimps LLC is a provider of unmanned aircraft systems for defense, civil, and commercial applications. They have over 40 years of combined experience in various unmanned platforms. Their services include pre-program planning, flight services, design and engineering, production, system documentation, training, payload integration, best practices, and staffing assistance. They have experience supporting defense programs and agencies like DHS, DoJ, and FBI.
Our environment consists of living and non-living components that interact in complex ways. Humans rely on healthy ecosystems, but our activities have disrupted natural cycles and caused pollution. Key issues include climate change from greenhouse gas emissions, which risks a runaway warming effect. While scientists agree human activity contributes to current warming trends, fully predicting climate impacts remains challenging given its complexity. Maintaining sustainable resource use requires understanding our role within natural systems.
The document discusses augmented Dickey-Fuller (ADF) tests for detecting unit roots in time series data. It presents the three possible forms of the ADF test regression and describes how to determine the appropriate model. A procedure is outlined for selecting between models and testing whether time series contain deterministic trends, constants, or unit roots. The document also provides instructions for performing ADF tests in Eviews software, including specifying the test regression, lag length, and interpreting the test results.
The document discusses revision control systems and their main concepts and operations. It describes how revision control allows for backup of files, sharing of work, and cooperative development. The key operations covered are checkout, commit, update, and revert. It also discusses branches, tags, and distributed version control systems.
The document discusses unit testing and the JUnit framework. It defines unit testing as testing individual units or modules of code in isolation to determine if they work as expected. JUnit is introduced as a unit testing framework for Java. Key concepts covered include test cases, test fixtures, test suites, annotations for setup and teardown like @Before and @After, and best practices for test-driven development. Examples are provided of writing test cases using JUnit to test a TreeNode class and its methods.
This document discusses solid waste management issues in India. It notes that rapid urbanization, neglect by authorities, and public apathy have led to a garbage crisis. To address this, authorities must implement proper waste management systems as per regulations by treating waste via composting, anaerobic digestion, or other technologies. The document outlines several waste treatment options and recommends that vermicomposting is suitable for individual homes, composting is best for medium capacities, and anaerobic digestion is appropriate for large volumes of waste. Effective waste management requires proper collection, transportation, treatment, disposal and public awareness.
Brief report about the contents of the Stream Reasoning workshop at SIWC 2016. Additional info about the event are available at: http://streamreasoning.org/events/sr2016
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...Daniele Dell'Aglio
This document proposes a query model called RSEP-QL to capture event pattern matching in RDF stream processing languages. It presents RSEP-QL's data model of RDF streams and windows, basic operators like EVENT and SEQ, and evaluation semantics. The goal is to provide a reference model for comparing different RSP query languages and studying related problems in a standardized way.
The talk I gave at the Stream Reasoning workshop in TU Berlin on December 8. I give an overview of RSEP-QL and how it can capture and formalise the behaviour of existing RSP engines, e.g. CSPARQL, EP-SPARQL, CQELS, SPARQLstream
Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...Riccardo Tommasini
This master's thesis document describes the Heaven framework for enabling systematic comparative research of RDF stream processing engines. The research question is whether a test stand, using existing queries, datasets and metrics, can support such comparative research. The author developed Heaven as an open source test stand and used it to evaluate four baseline RDF stream processing engines. Heaven provides extensible components, methods for layered analysis, and tools to identify patterns and enable visual comparison of engine performance. The thesis concludes Heaven supports comparative research of RDF stream processing engines and identifies opportunities for further developing Heaven and researching additional engines.
SDRule-L: Managing Semantically Rich Business Decision ProcessesChristophe Debruyne
Semantic Decision Rule Language (SDRule-L) is an extension to Object-Role Modelling language (ORM), which is one of the most popular fact based, graphical modelling languages for designing information systems. In this paper, we want to discuss how SDRule-L models can be formalized, analysed and applied in a business context. An SDRule-L model may contain static (e.g., data constraints) and dynamic rules (e.g., sequence of events). A reasoning engine is created for detecting inconsistency. When an SDRule-L model is used to manage linked data, a feasible way is to align SDRule-L with Semantic Web languages, e.g. OWL. In order to achieve this, we propose to map dynamic rules into a combination of static rules and queries for detecting anomalies. In this paper, we will illustrate a model reification algorithm for automatically transforming SDRule-L models that contain dynamic rules into the ones containing static rules, which can be formalized in Description Logic.
Published as: Yan Tang Demey, Christophe Debruyne: SDRule-L: Managing Semantically Rich Business Decision Processes. EC-Web 2013: 59-67
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
Apache Beam is Flink’s sibling in the Apache family of streaming processing frameworks. The Beam and Flink teams work closely together on advancing what is possible in streaming processing, including Streaming SQL extensions and code interoperability on both platforms.
Beam was originally developed at Google as the amalgamation of its internal batch and streaming frameworks to power the exabyte-scale data processing for Gmail, YouTube and Ads. It now powers a fully-managed, serverless service Google Cloud Dataflow, as well as is available to run in other Public Clouds and on-premises when deployed in portability mode on Apache Flink, Spark, Samza and other runners. Users regularly run distributed data processing jobs on Beam spanning tens of thousands of CPU cores and processing millions of events per second.
In this session, Sergei Sokolenko, Cloud Dataflow product manager, and Reuven Lax, the founding member of the Dataflow and Beam team, will share Google’s learnings from building and operating a global streaming processing infrastructure shared by thousands of customers, including:
safe deployment to dozens of geographic locations,
resource autoscaling to minimize processing costs,
separating compute and state storage for better scaling behavior,
dynamic work rebalancing of work items away from overutilized worker nodes,
offering a throughput-optimized batch processing capability with the same API as streaming,
grouping and joining of 100s of Terabytes in a hybrid in-memory/on-desk file system,
integrating with the Google Cloud security ecosystem, and other lessons.
Customers benefit from these advances through faster execution of jobs, resource savings, and a fully managed data processing environment that runs in the Cloud and removes the need to manage infrastructure.
Foundations of streaming SQL: stream & table theoryDataWorks Summit
The document provides an overview of streaming SQL and time-varying relations. It discusses:
1) How relations evolve over time in streaming SQL, with data divided into time intervals. This allows querying the relation at any point in time.
2) The closure properties of relational algebra still apply to time-varying relations. Operations like filtering and grouping can be performed on intervals of the relation.
3) Streaming SQL extends classic SQL to handle continuous queries over streaming data, represented as time-varying relations divided into time-based intervals.
Distributed systems often replicate data across multiple servers for performance and reliability. To keep replicas consistent, conflicting operations like write-write must be ordered the same everywhere. However, guaranteeing a global order is costly and hurts scalability. Weaker consistency models address this by relaxing the consistency requirements. Client-centric models like monotonic reads and writes ensure users see their own updates regardless of which server they access.
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
Dataflow is a unified programming model and a managed service for developing and executing a wide range of data processing patterns including ETL, batch computation, and continuous computation. Cloud Dataflow frees you from operational tasks like resource management and performance optimization.
OLAP Basics and Fundamentals by Bharat Kalia Bharat Kalia
The document discusses online analytical processing (OLAP) and the need for OLAP capabilities beyond basic data analysis. It describes how OLAP uses multidimensional data models and pre-computed aggregates to provide fast and interactive analysis of data across multiple dimensions. Different approaches for implementing OLAP like ROLAP, MOLAP, and hybrid systems are covered.
Hybrid Fuzzy Sliding Mode Controller for Timedelay Systemijaia
This document describes a hybrid fuzzy sliding mode controller for time-delay systems. It begins by introducing time-delay systems and discussing sliding mode control as a suitable technique. It then presents the design of a sliding surface for the error function of a nonlinear time-delay system. Next, it describes constructing a fuzzy logic controller by designing a fuzzy rule base using the generated error signals. Simulation results found the proposed scheme to be robust even with perturbed system parameters. The aim is to develop an effective control algorithm for highly unstable nonlinear systems such as aerospace systems.
The document proposes an approach for efficient semantically enriched complex event processing and pattern matching. It introduces a temporally annotated RDF named graph data model to represent event streams. A query model is presented that decomposes queries into subqueries over event patterns. These subqueries are rewritten and processed in parallel and distributed manner. The proposed approach aims to integrate stream reasoning and complex event processing by supporting new operators for RDF graph patterns and allowing the use of techniques like NFA and EDG for pattern matching.
The document summarizes computational aspects of vehicle routing problems. It discusses time complexity and space complexity, and how they are measured as functions of problem size. It provides examples of calculating complexity for different algorithms. It also discusses common data structures for representing routes, including array lists, doubly linked lists, and their pros and cons for different operations. The document outlines Java code examples for comparing route representation using these data structures.
Multi-Perspective Comparison of Business Processes Variants Based on Event LogsMarlon Dumas
This document presents a method for multi-perspective comparison of business process variants based on event logs. The method involves constructing perspective graphs from different abstractions of event logs to analyze processes from different perspectives based on event attributes. Differential perspective graphs are then used to identify statistically significant differences between two event logs, representing different process variants. The method was experimentally applied to compare differences between divisions in an IT incident handling process using various abstractions and observations. The experiments revealed differences in activity statuses, control flows between countries, and control flow frequencies over time between the divisions.
The document discusses various software metrics that can be used to measure attributes of software products and processes. It describes metrics for size (e.g. lines of code), complexity (e.g. cyclomatic complexity), quality (e.g. defects per KLOC), design (e.g. coupling and cohesion), and object-oriented software (e.g. weighted methods per class). The goals of metrics include estimating costs, evaluating quality, and improving processes and products.
IMaRS - Incremental Materialization for RDF Streams (SR4LD2013)Daniele Dell'Aglio
This document discusses incremental materialization for RDF streams (IMaRS). IMaRS is an approach for incremental reasoning over sliding windows of RDF streams. It avoids recomputing the entire materialization when the window slides by tracking expiration times and computing only the changes (additions and removals) needed for the new materialization. The maintenance is done through execution of a logic program that uses contexts to build the delta sets for updating the materialization incrementally as new data enters the window.
Principles in Data Stream Processing | Matthias J Sax, ConfluentHostedbyConfluent
Data stream processing is, for many of us, a new paradigm with which you process data and build applications. In this talk, we will take you on a journey through the theoretical foundations of stream processing and discuss the underlying principles and unique problems that need to be addressed. What actually is a data stream anyway? And how do I use it? How do streams relate to application state and when do I use the one or the other?
ksqlDB and Kafka Streams are both, at their core, designed to help build stream processing applications and we will explain how stream processing principles are reflected in the design of each system and what trade-offs were chosen (and - more importantly! - why). Finally, we take a look into the future how the stream processing space, and in particular ksqlDB and Kafka Streams, may evolve over the next few years as we outline extensions and improvements to the underlying conceptual model. So, bring your thinking hats and notepads and prepare to learn WHY these systems are the way they are!
The document discusses software metrics and regression testing. It defines software metrics as quantitative methods for assessing software quality. Metrics can be measured through techniques applied to the software development lifecycle and products. This allows for providing meaningful management information and improving processes. Types of metrics discussed include size, control flow, and data metrics. Regression testing involves re-running existing test suites on modified programs to identify new issues. The document also provides examples of measuring complexity for a sample program using metrics like lines of code, cyclomatic complexity, and Halstead metrics to analyze effort.
Spark Summit EU talk by Herman van HovellSpark Summit
This document provides a summary of Apache Spark's Catalyst optimizer:
- Catalyst optimizes user programs expressed using SQL, DataFrames, or Datasets by automatically finding the most efficient execution plan. It represents programs as trees and applies transformations to these trees.
- The key steps in Catalyst are analysis, logical optimization, physical planning, and code generation. Analysis resolves the logical plan using the catalog. Logical optimization applies rule-based transformations. Physical planning selects physical operators and ensures requirements are met.
- Transformations are implemented using partial functions that match and replace patterns in trees. Multiple rules are combined using a rule executor. Strategies transform the logical plan to physical operators. The planner also
On Relevant Query Answering over Streaming and Distributed DataShima Zahmatkesh
This document discusses optimizing query evaluation over streaming and distributed data to continuously obtain relevant results while maintaining system reactiveness. It proposes approaches for queries with filter clauses and top-k queries. For queries with filters, maintenance policies like Filter Update Policy and combined policies improve performance. For top-k queries, the Super-MTK+N list and Top-k+N algorithm handle changes to distributed data. The AcquaTop framework applies different maintenance policies. Experimental results show the approaches achieve more relevant and accurate results than the state-of-the-art.
On the need to include functional testing in RDF stream engine benchmarks Emanuele Della Valle
The document discusses the need to include functional testing in benchmarks for RDF stream engines to verify correctness of results. It presents an example query and shows that different engines can produce different correct results due to variations in operational semantics. This highlights the importance of modeling engine semantics and developing an "oracle" for benchmarks to check results against expected output. The conclusions advocate extending existing benchmarks with correctness testing while also measuring performance metrics like throughput.
Similar to On Unified Stream Reasoning - The RDF Stream Processing realm (20)
This document discusses methods for distributed stream consistency checking against a conceptual model. It presents the problem of ensuring streaming data complies with an ontology model while dealing with noise and large volumes. Two methods - NTM and LN - are proposed and evaluated. The LN method models the negative inclusion axioms in the ontology as a pipeline of bolts, reducing the load on individual bolts compared to NTM and improving performance up to 300%. Future work is discussed around more expressive languages, inconsistency repair, and implementation on other stream processing engines.
The presentation I gave at Linköping University about web stream processing. I discuss two problems: (i) exchanging data streams on the web, and (ii) combining streams and contextual quasi-static data on the web
Triplewave: a step towards RDF Stream Processing on the WebDaniele Dell'Aglio
The slides of my talk at INSIGHT Centre for Data Analytics (in NUI Galway) where I presented TripleWave (http://streamreasoning.github.io/TripleWave/), an open-source framework to create and publish streams of RDF data.
The document provides an overview of RDF stream processing, including:
- Extending the RDF data model to represent RDF streams and associate application times to data items
- Modeling continuous query evaluation over RDF streams using the CQL/STREAM model of mapping streams to relations and using sliding windows
- How existing systems extend CQL with operators for mapping between RDF streams and relations and for evaluating continuous SPARQL queries over windows of streaming RDF data.
This document presents a survey of temporal extensions of description logics (DLs) conducted by Daniele Dell'Aglio, Fariz Darari and Davide Lanti. It begins with an overview and outline of the topics that will be covered, including a running example to model how to become a doctor. The paper then surveys existing solutions for extending DLs with temporal aspects, including state-change based DLs, temporal DLs with an internal approach, point-based temporal DLs and interval-based temporal DLs. It concludes with a discussion of current hot topics and future directions for research on temporal extensions of DLs.
Presentation on RDF Stream Processing models given at the SR4LD tutorial (ISWC 2013) -- updated version at: http://www.slideshare.net/dellaglio/rsp2014-01rspmodelsss
Maven is a build automation tool that uses conventions over configurations. It utilizes a project object model (POM) file that defines project coordinates, dependencies, plugins, and repositories. Maven projects follow a standard directory structure and use lifecycles made up of phases to execute goals like compiling, testing, packaging, and deploying. It retrieves dependencies and plugins from repositories, caching artifacts locally for reuse.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
What is an RPA CoE? Session 2 – CoE RolesDianaGray10
In this session, we will review the players involved in the CoE and how each role impacts opportunities.
Topics covered:
• What roles are essential?
• What place in the automation journey does each role play?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
On Unified Stream Reasoning - The RDF Stream Processing realm
1. Daniele Dell’Aglio
On Unified Stream Reasoning
The RDF Stream Processing realm
Daniele Dell’Aglio
WU Vienna, 18/02/2016
2. Daniele Dell’Aglio
Problem setting
Real time integration of huge volumes of dynamic data from
heterogeneous sources
– Traffic Prediction
– Social media analytics
– Personalised services
2
3. Daniele Dell’Aglio
Stream Reasoning
Stream Reasoning (SR): inference over streams of data
– Stream and Event Processing: real-time processing of
highly dynamic data
• Aggregations, filters
• Complex event detection
– Reasoning
• Access and integration of heterogeneous data
• Make explicit hidden information
3
5. Daniele Dell’Aglio
The initial problem (1)
Where are Alice and Bob,
when they are together?
Let’s consider a tumbling
window W(ω=β=5)
Let’s execute the
experiment 4 times
Execution 1° answer 2° answer
1 :hall [6] :kitchen [11]
2 :hall [5] :kitchen [10]
3 :hall [6] :kitchen [11]
4 - [7] - [12]
S1 S2 S3 S4S
t3 6 91
{:alice :isIn :hall}
{:bob :isIn :hall}
{:alice :isIn :kitchen}
{:bob :isIn :kitchen}
Which is the correct answer?
width
slide
5
6. Daniele Dell’Aglio
The initial problem (2)
System 1 System 2
Which system behaves in the correct way?
Execution 1° answer 2° answer
1 :hall [6] :kitchen [11]
2 :hall [5] :kitchen [10]
3 :hall [6] :kitchen [11]
4 - [7] - [12]
Execution 1° answer 2° answer
1 :hall [3] :kitchen [9]
2 No answers
3 :hall [3] :kitchen [9]
4 No answers
S1 S2 S3 S4S
t3 6 91
{:bob :isIn :hall} {:bob :isIn :kitchen}
{:alice :isIn :hall} {:alice :isIn :kitchen}
6
7. Daniele Dell’Aglio
Problem
How to unify current Stream Reasoning techniques?
Why do we need it?
• Comparison and contrast
• Interoperability
• Study RDF Stream Processing related problems
• Standard RSP query language
7
8. Daniele Dell’Aglio
Streams Ontology
Background
data
Entailment
Regimes
RSEP-QL
Applications
RSP-QL
BGP evaluation
over streams BGP evaluation
over BKG
Event Pattern
detection operators
Model to express
continous queries
The entailment regimes
require an ontology and
provide more answers w.r.t.
Both RSP-QL and RSEP-QL
Not part of the today talk!
Contribution – RSEP-QL
A comprehensive model that formally defines the semantics of
RDF Stream Processing engines
8
9. Daniele Dell’Aglio
Q
(E, DS, QF)
From SPARQL…
Evaluator
Data layer
Result
Formatter
Ans(Q)RDF graphs
E
DS
QF
Query
Interface
9
10. Daniele Dell’Aglio
Q
(E, DS, QF)
…to RSEP-QL
Evaluator
Data layer
Result
Formatter
Ans(Q)RDF graphs
E
DS
QF
Continuous
EvaluatorET
RDF graphs
RDF streams
Query
Interface
SDS
Q
(E, SDS, QF)
Q
(E, SDS, ET, QF)
Q
(SE, SDS, ET, QF)
SE
10
17. Daniele Dell’Aglio
From SPARQL dataset to RSEP-QL Streaming Dataset
t1
G(t1)
T⊆ ℕ R={RDF graph}
SPARQL dataset
G
H
Instantaneous Graph
G(t1) RTime-Varying Graph
G: T R
RSP-QL dataset
S3
S4 S5
S6
S7
S8
S9 S10
S11
S12
S
S1
S2
𝕎(S)
17
18. Daniele Dell’Aglio
Evaluation
The SPARQL evaluation function is defined as
⟦𝑃⟧ 𝐷𝑆(𝐺)
The RSEP-QL evaluation function extends the SPARQL one by
introducing the evaluation time instant
⟦𝑃⟧ 𝑆𝐷𝑆(𝐴)
𝑡
SPARQL operators are straight extended to the new evaluation
function
Example: JOIN
⟦𝐽𝑂𝐼𝑁(𝑃1, 𝑃2)⟧ 𝑆𝐷𝑆 𝐴
𝑡
= ⟦𝑃1⟧ 𝑆𝐷𝑆 𝐴
𝑡
⨝ ⟦𝑃2⟧ 𝑆𝐷𝑆 𝐴
𝑡
18
19. Daniele Dell’Aglio
Instantaneous evaluation
The main difference is on the BGP evaluation:
⟦𝐵𝐺𝑃⟧ 𝑆𝐷𝑆(𝐴)
𝑡
=⟦𝐵𝐺𝑃⟧ 𝑆𝐷𝑆(𝐴,𝑡)
SDS(A,t) is:
SDS(G,t)= SDS(G(t)) if A is a time-varying graph G
SDS(𝕎(S),t)=SDS(m(𝕎(S,t))) if A is from a sliding window 𝕎
SDS(𝕃(S),t)=SDS(m(𝕃(S,t))) if A is from a landmark window 𝕃
where m denotes a merge function
m(𝕎(S,t))= 𝑑 𝑖,𝑡 𝑖 ∈𝕎(S,t) 𝑑𝑖
– takes as input a window content i.e. a sequence of timestamped
RDF graphs
– produces an RDF graph
19
20. Daniele Dell’Aglio
Continuous evaluation
For each evaluation time t ∈ ET: ⟦𝑆𝐸⟧ 𝑆𝐷𝑆(𝐴)
𝑡
– The continuous evaluation is a sequence of instantaneous
evaluations
It is not always possible to compute ET a priori
– Can be data dependent
– ET is expressed through a Report Policy
A Report Policy is a set of conditions to one or more window
operators in SDS
– Initially defined in SECRET for Stream Processing engines
20
21. Daniele Dell’Aglio
Continuous evaluation – Report Policies
Report Policy examples:
– P Periodic: the window reports only at regular intervals
– WC Window Close: the window reports if the active
window closes
– CC Content Change: the window reports if the content
changes.
21
22. Daniele Dell’Aglio
Event Processing – Basic Event Pattern
Support to Complex Event Processing operators
The minimal element is the Basic Event Pattern:
EVENT 𝑤 𝑃
Intuitively, the Basic Graph Pattern 𝑃 should match against one
stream item of the window identified by 𝑤
BEP can be combined through complex operators
• SEQ, LAST, EVERY
Example:
EVENT 𝑤1 𝑃1 SEQ EVERY EVENT 𝑤2 𝑃2
22
23. Daniele Dell’Aglio
Event Processing – Evaluation semantics
Formally, we use a new evaluation function ⦅⋅⦆ 𝑜,𝑐
𝑡
• t is the evaluation time instant,
• 𝑜, 𝑐 is an additional window to identify the portion of the
data on which the event may happen
Event pattern evaluation produces event mappings 𝜇, 𝑡1, 𝑡2
• 𝜇 is a solution mapping
• 𝑡1 and 𝑡2 denote the time inverval justifying 𝜇
23
24. Daniele Dell’Aglio
Event Processing – Evaluation semantics - Examples
The evaluation of EVENT 𝑤1 𝑃1 SEQ EVERY EVENT 𝑤2 𝑃2 is
24
S2
S3 S4
S1 S1
S6
S7
S8
S9 S10
S11
S12
S2
EVENT 𝑤1 𝑃1
SEQ
EVERY EVENT 𝑤2 𝑃2
t
10 12 14 1611 13 15
25. Daniele Dell’Aglio
Event Processing – Evaluation semantics - Examples
The evaluation of EVENT 𝑤1 𝑃1 SEQ EVERY EVENT 𝑤2 𝑃2 is
25
S2
S3 S4
S1 S1
S6
S7
S8
S9 S10
S11
S12
S2
EVENT 𝑤1 𝑃1
SEQ
EVERY EVENT 𝑤2 𝑃2
t
10 12 14 1611 13 15
26. Daniele Dell’Aglio
Event Processing – Evaluation semantics - Examples
The evaluation of EVENT 𝑤1 𝑃1 SEQ EVERY EVENT 𝑤2 𝑃2 is
26
S2
S3 S4
S1 S1
S6
S7
S8
S9 S10
S11
S12
S2
S1 S10
EVENT 𝑤1 𝑃1
SEQ
EVERY EVENT 𝑤2 𝑃2
t
10 12 14 1611 13 15
11 13
27. Daniele Dell’Aglio
Event Processing – Evaluation semantics - Examples
The evaluation of EVENT 𝑤1 𝑃1 SEQ EVERY EVENT 𝑤2 𝑃2 is
27
S2
S3 S4
S1 S1
S6
S7
S8
S9 S10
S11
S12
S2
S1 S10
S1 S12
EVENT 𝑤1 𝑃1
SEQ
EVERY EVENT 𝑤2 𝑃2
t
10 12 14 1611 13 15
11 13
11 15
28. Daniele Dell’Aglio
Event Processing – MATCH graph pattern
Event patterns are eclosed in MATCH graph patterns
• Event mappings exist only in the context of event patterns
• The evaluation of a MATCH graph pattern produces a bag of
solution mappings
𝑀𝐴𝑇𝐶𝐻 𝐸 𝑆𝐷𝑆 𝐴
𝑡
= {𝜇| 𝜇, 𝑡1, 𝑡2 ∈ ⦅𝐸⦆ 0,𝑡
𝑡
}
It is possible to combine the MATCH graph pattern with other
SPARQL graph patterns
28
32. Daniele Dell’Aglio
What’s next?
An RSEP-QL query language
• W3C RSP CG ongoing activities
Implementations
• Yet another RSP engine
• Framework to let existing RSP engine interoperate
Streams are getting popular – applications want more and more
sophisticated features
• Different timestamps, out-of-orders
• Inductive reasoning to cope with noise
• Permanent storage of portions of data (raw or inferred)
32
33. Daniele Dell’Aglio
Conclusions
The dynamics introduced in the continuous query evlauation
process have not been totally understood
• Not fully captured by existing models
• RSEP-QL captures those dynamics
• All of them? Let’s discover it!
We need to push implementations and applications on use cases
• To understand which helpful operators are missing
• To find new unexpected behaviours
33
34. Daniele Dell’Aglio
People I am grateful to...
Emanuele Della Valle
and:
Marco Balduini
Jean-Paul Calbimonte
Oscar Corcho
Minh Dao-Trao
Danh Le Phuoc
Freddy Lecue
34
35. Daniele Dell’Aglio
... without forgetting you!
Thank you! Questions?
On Unified Stream Reasoning
The RDF Stream Processing realm
Daniele Dell’Aglio
daniele.dellaglio@polimi.it
http://dellaglio.org
35