SlideShare a Scribd company logo
Distributed Stream Consistency Checking
Shen Gao, Daniele Dell’Aglio, Jeff Z. Pan and Abraham Bernstein
Cáceres, Spain, 08.06.2018
Carlo Bernaschina (presenter)
Problem setting
ICWE, 08.06.2018Distributed Stream Consistency Checking2/25
 Real time processing of huge volumes of dynamic data
 Smart cities
 News
 Knowledge graph
The problem of noise
ICWE, 08.06.2018Distributed Stream Consistency Checking
 Streaming data are often noisy
 Broken sensors
 Malicious data injection
 Measurement errors
 How to cope with noise?
 Machine learning and numerical analyses to cope with noise in
time series
 When streams are complex (as Web streams), we want to
ensure that they are compliant to a (non-trivial)
conceptual model
3/25
Research question
How to assess the consistency of streams w.r.t. a
fixed and known a-priori conceptual model?
ICWE, 08.06.2018Distributed Stream Consistency Checking4/25
Towards a solution
ICWE, 08.06.2018Distributed Stream Consistency Checking
 How to model the stream consistency check problem?
5/25
How to model the conceptual model?
ICWE, 08.06.2018Distributed Stream Consistency Checking
 DL-Litecore
 The set of PIs and NIs composes a TBoxT
Person
Student Employee
Faculty Admin
Positive Inclusion (PI)
PhD student
Person
Organization
DJ
Negative Inclusion (NI)
6/25
How to model the data?
ICWE, 08.06.2018Distributed Stream Consistency Checking
 ABox axioms associate:
 Individuals to classes
 Shen is a
 University of Zurich is a
 Individuals to other individuals
 Shen attends the University of Zurich
 Inconsistencies arise when the ontology (TBox + ABox)
contains contraditions
 Daniele is a
 Daniele is a
 disjoint
PhD student
University
PhD student
University
PhD student University
7/25
How to model the data stream?
ICWE, 08.06.2018Distributed Stream Consistency Checking
 Ontology stream
 One staticTBox
 A sequence of time-annotated
ABoxes with the updates
 Sliding window over the
ontology stream
 Captures a recent set of events
A1
A3
A5
{ Shen is a }
3
5
1
t
PhD student
{ Jeff is a
Daniele is a }
Employee
Student
{ Avi is a }PhD student
TBoxPerson
Student Employee
Faculty AdminPhD student
Organiz.
Univers. High school
DJ
8/25
The stream consistency check problem
ICWE, 08.06.2018Distributed Stream Consistency Checking
 Given an ontology stream,
we want to check if it is
consistent w.r.t. a sliding
window of a fixed size
 At each time instant, we
want to check if the events
captured by the sliding
window are consistent
 TheTBox and the current
window content compose
an ontology
A1
A3
A5
{ Shen is a }
3
5
1
t
PhD student
{ Jeff is a
Daniele is a }
University
Student
{ Jeff is a }PhD student
TBoxPerson
Student Employee
Faculty AdminPhD student
Organiz.
Univers. High school
DJ
9/25
Towards a solution
ICWE, 08.06.2018Distributed Stream Consistency Checking
 How to model the stream consistency check problem?
 Description logics, ontology streams
 How to cope with a huge amount of streaming data?
10/25
Scalability
ICWE, 08.06.2018Distributed Stream Consistency Checking
How to cope with the problem when the data volume is
big?
 Sliding windows
 The content of the window may still be too large to be
processed online
 Distribution of the stream consistency checking process
 We build our solution on top of a Distributed Stream
Processing Engine (DSPE)
 We adopt the Storm terminology to introduce the main
concepts, but they are common to other DSPEs
11/25
DSPE concepts
ICWE, 08.06.2018Distributed Stream Consistency Checking
S B1 B2
B1 B2S
B1
B1 B2
Logical topology
Physical topology
BoltsSpout
Node 1
Node 2
Node 3
Tuples
12/25
Towards a solution
ICWE, 08.06.2018Distributed Stream Consistency Checking
 How to model the stream consistency check problem?
 Description logics, ontology streams
 How to cope with a huge amount of streaming data?
 Distributed stream processing engines
 How to perform stream consistency checking over DSPE?
13/25
The NI closure
ICWE, 08.06.2018Distributed Stream Consistency Checking
 Given theTBox T, it is possible to compute all the
possible Negative Inclusion axioms
 The set of all the possible NI axioms is named NI closure
Person
Student Employee
Faculty AdminPhD student
DJ Organization
University Company
14/25
B1
The NIs Topology Method (NTM)
ICWE, 08.06.2018Distributed Stream Consistency Checking
 The resulting topology is the following
 A bolt evaluates when the disjoint axioms in the NI
closure are satisfied
 Each axiom is encoded as a conjunction operation
S B1
15/25
Daniele is a Person
Inconsistency
Daniele is a University
o1
Shen is a Company
Inconsistency
Shen is a Student
o2
Improving NTM
ICWE, 08.06.2018Distributed Stream Consistency Checking
 Drawback of NTM
 The NI closure size can be exponential to the size of theTBox
 The bolt B1 becomes the bottleneck of the topology
 Introduction of inference operations to reduce the number
of conjunction operations
16/25
o
Daniele is a Student Daniele is a Person
S B1
Improving NTM - intuition
ICWE, 08.06.2018Distributed Stream Consistency Checking
Person
Student Employee
DJ Organization
University Company
9 NIs
S B1 B2
Student -> Person
Employee -> Person
Company -> Organization
University -> Organization
S B1
1 NI
17/25
The Pipeline Topology Method (LN)
ICWE, 08.06.2018Distributed Stream Consistency Checking
DJ(Person,Publication)
DJ(Student,Publication)
DJ(Student,Employee)
DJ(Article,Student)
DJ(Person,Organization)
...
Computes the
NI closure
DJ(Person,Publication)
DJ(Student,Employee)
DJ(Person,Organization)
...
Identifies the
essential NIs
Groups and
orders the
essential NIs
18/25
The Pipeline Topology Method (LN) cont’d
ICWE, 08.06.2018Distributed Stream Consistency Checking
Groups are
assigned to bolts
This step has a
major impact on
performance!
Less NIs w.r.t. NTM
19/25
Towards a solution
ICWE, 08.06.2018Distributed Stream Consistency Checking
 How to model the stream consistency check problem?
 Description logics, ontology streams
 How to cope with a huge amount of streaming data?
 Distributed stream processing engines
 How to perform stream consistency checking over DSPE?
 NTM, LN
 How to they perform?
20/25
Setup
ICWE, 08.06.2018Distributed Stream Consistency Checking
 Ontologies
 LUBM
 56 PIs, 70 NIs
 NPD
 332 PIs, 51 Nis
 Six machines
 128GB ram
 2 E5-2680 v2 processors (10 cores per processor)
 Twitter Heron 0.14.3
21/25
Comparing NTM and LN
ICWE, 08.06.2018Distributed Stream Consistency Checking
S B1 B2
LN-x:
x NI groups
Half of the
nodes assigned
to check
consistency
Similar results
LN-2 outperforms
NTM up to 139% The load on the first
node increases
22/25
Investigating the results
ICWE, 08.06.2018Distributed Stream Consistency Checking
LN LN LN
LN
LN
LN
LN
LN
LN LN
LN LN LN LN LN LN
LN
LN LN LN
NTM
23/25
Conclusions
ICWE, 08.06.2018Distributed Stream Consistency Checking
 It is possible to perform consistency checking over high
volumes of data streams
 We developed two methods (NTM and LN) and studied
their performance
 More than 14 million tuples/minute
 LN can outperform NTM up to 300%
 What’s next
 Towards more expressive ontological languages
 Repairing inconsistencies
 Implementation and testing over other DPSEs
24/25
Thank you! Questions?
Distributed Stream Consistency Checking
Shen Gao, Daniele Dell’Aglio, Jeff Z. Pan,Abraham Bernstein
ICWE, 08.06.2018Distributed Stream Consistency Checking25/25

More Related Content

Similar to Distributed stream consistency checking

Unit 4.pdf
Unit 4.pdfUnit 4.pdf
Unit 4.pdf
Jayaprasanna4
 
Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach  Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach
IJECEIAES
 
Panel slides
Panel slidesPanel slides
Panel slides
Titus Mutambu Mweta
 
NNLO PDF fits with top-quark pair differential distributions
NNLO PDF fits with top-quark pair differential distributionsNNLO PDF fits with top-quark pair differential distributions
NNLO PDF fits with top-quark pair differential distributions
Juan Rojo
 
AIM NIAC PNNL-SA-116502
AIM NIAC PNNL-SA-116502AIM NIAC PNNL-SA-116502
AIM NIAC PNNL-SA-116502
Mark Greaves
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
Tomaso Aste
 
Sampling methods for counting temporal motifs
Sampling methods for counting temporal motifsSampling methods for counting temporal motifs
Sampling methods for counting temporal motifs
Austin Benson
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
tsysglobalsolutions
 
I0343047049
I0343047049I0343047049
I0343047049
inventionjournals
 
Brema tarigan 09030581721015
Brema tarigan 09030581721015Brema tarigan 09030581721015
Brema tarigan 09030581721015
ferdiandersen08
 
Transport Layer Caching Mechanisms and Optimization
Transport Layer Caching Mechanisms and OptimizationTransport Layer Caching Mechanisms and Optimization
Transport Layer Caching Mechanisms and Optimization
Nestor Michael Tiglao
 
Compositional Blocks for Optimal Self-Healing Gradients
Compositional Blocks for Optimal Self-Healing GradientsCompositional Blocks for Optimal Self-Healing Gradients
Compositional Blocks for Optimal Self-Healing Gradients
Roberto Casadei
 
NNPDF3.1
NNPDF3.1NNPDF3.1
NNPDF3.1
juanrojochacon
 
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
Adel Sabour
 
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
XAIC
 
SSG4Env EGU2010
SSG4Env EGU2010SSG4Env EGU2010
SSG4Env EGU2010
Jean-Paul Calbimonte
 
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and SolutionsCloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Yu Liu
 
IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016 IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016
tsysglobalsolutions
 
A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008
Emanuele Della Valle
 
On chip stress analysis in a package interaction test chip
On chip stress analysis in a package interaction test chipOn chip stress analysis in a package interaction test chip
On chip stress analysis in a package interaction test chip
CADFEM Austria GmbH
 

Similar to Distributed stream consistency checking (20)

Unit 4.pdf
Unit 4.pdfUnit 4.pdf
Unit 4.pdf
 
Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach  Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach
 
Panel slides
Panel slidesPanel slides
Panel slides
 
NNLO PDF fits with top-quark pair differential distributions
NNLO PDF fits with top-quark pair differential distributionsNNLO PDF fits with top-quark pair differential distributions
NNLO PDF fits with top-quark pair differential distributions
 
AIM NIAC PNNL-SA-116502
AIM NIAC PNNL-SA-116502AIM NIAC PNNL-SA-116502
AIM NIAC PNNL-SA-116502
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
Sampling methods for counting temporal motifs
Sampling methods for counting temporal motifsSampling methods for counting temporal motifs
Sampling methods for counting temporal motifs
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
I0343047049
I0343047049I0343047049
I0343047049
 
Brema tarigan 09030581721015
Brema tarigan 09030581721015Brema tarigan 09030581721015
Brema tarigan 09030581721015
 
Transport Layer Caching Mechanisms and Optimization
Transport Layer Caching Mechanisms and OptimizationTransport Layer Caching Mechanisms and Optimization
Transport Layer Caching Mechanisms and Optimization
 
Compositional Blocks for Optimal Self-Healing Gradients
Compositional Blocks for Optimal Self-Healing GradientsCompositional Blocks for Optimal Self-Healing Gradients
Compositional Blocks for Optimal Self-Healing Gradients
 
NNPDF3.1
NNPDF3.1NNPDF3.1
NNPDF3.1
 
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
 
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
 
SSG4Env EGU2010
SSG4Env EGU2010SSG4Env EGU2010
SSG4Env EGU2010
 
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and SolutionsCloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
 
IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016 IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016
 
A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008
 
On chip stress analysis in a package interaction test chip
On chip stress analysis in a package interaction test chipOn chip stress analysis in a package interaction test chip
On chip stress analysis in a package interaction test chip
 

More from Daniele Dell'Aglio

On web stream processing
On web stream processingOn web stream processing
On web stream processing
Daniele Dell'Aglio
 
On a web of data streams
On a web of data streamsOn a web of data streams
On a web of data streams
Daniele Dell'Aglio
 
Triplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebTriplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the Web
Daniele Dell'Aglio
 
On unifying query languages for RDF streams
On unifying query languages for RDF streamsOn unifying query languages for RDF streams
On unifying query languages for RDF streams
Daniele Dell'Aglio
 
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...
Daniele Dell'Aglio
 
Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016
Daniele Dell'Aglio
 
On Unified Stream Reasoning
On Unified Stream ReasoningOn Unified Stream Reasoning
On Unified Stream Reasoning
Daniele Dell'Aglio
 
On Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realmOn Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realm
Daniele Dell'Aglio
 
Querying the Web of Data with XSPARQL 1.1
Querying the Web of Data with XSPARQL 1.1Querying the Web of Data with XSPARQL 1.1
Querying the Web of Data with XSPARQL 1.1
Daniele Dell'Aglio
 
Augmented Participation to Live Events through Social Network Content Enrichm...
Augmented Participation to Live Events through Social Network Content Enrichm...Augmented Participation to Live Events through Social Network Content Enrichm...
Augmented Participation to Live Events through Social Network Content Enrichm...
Daniele Dell'Aglio
 
An experience on empirical research about rdf stream
An experience on empirical research about rdf streamAn experience on empirical research about rdf stream
An experience on empirical research about rdf stream
Daniele Dell'Aglio
 
RDF Stream Processing Models (RSP2014)
RDF Stream Processing Models (RSP2014)RDF Stream Processing Models (RSP2014)
RDF Stream Processing Models (RSP2014)
Daniele Dell'Aglio
 
A Survey of Temporal Extensions of Description Logics
A Survey of Temporal Extensions of Description LogicsA Survey of Temporal Extensions of Description Logics
A Survey of Temporal Extensions of Description Logics
Daniele Dell'Aglio
 
IMaRS - Incremental Materialization for RDF Streams (SR4LD2013)
IMaRS - Incremental Materialization for RDF Streams (SR4LD2013)IMaRS - Incremental Materialization for RDF Streams (SR4LD2013)
IMaRS - Incremental Materialization for RDF Streams (SR4LD2013)
Daniele Dell'Aglio
 
RDF Stream Processing Models (SR4LD2013)
RDF Stream Processing Models (SR4LD2013)RDF Stream Processing Models (SR4LD2013)
RDF Stream Processing Models (SR4LD2013)
Daniele Dell'Aglio
 
On correctness in RDF stream processor benchmarking
On correctness in RDF stream processor benchmarkingOn correctness in RDF stream processor benchmarking
On correctness in RDF stream processor benchmarking
Daniele Dell'Aglio
 
An Ontological Formulation and an OPM profile for Causality in Planning Appli...
An Ontological Formulation and an OPM profile for Causality in Planning Appli...An Ontological Formulation and an OPM profile for Causality in Planning Appli...
An Ontological Formulation and an OPM profile for Causality in Planning Appli...
Daniele Dell'Aglio
 
P&MSP2012 - Maven
P&MSP2012 - MavenP&MSP2012 - Maven
P&MSP2012 - Maven
Daniele Dell'Aglio
 
P&MSP2012 - Version Control Systems
P&MSP2012 - Version Control SystemsP&MSP2012 - Version Control Systems
P&MSP2012 - Version Control Systems
Daniele Dell'Aglio
 
P&MSP2012 - Unit Testing
P&MSP2012 - Unit TestingP&MSP2012 - Unit Testing
P&MSP2012 - Unit Testing
Daniele Dell'Aglio
 

More from Daniele Dell'Aglio (20)

On web stream processing
On web stream processingOn web stream processing
On web stream processing
 
On a web of data streams
On a web of data streamsOn a web of data streams
On a web of data streams
 
Triplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebTriplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the Web
 
On unifying query languages for RDF streams
On unifying query languages for RDF streamsOn unifying query languages for RDF streams
On unifying query languages for RDF streams
 
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...
 
Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016
 
On Unified Stream Reasoning
On Unified Stream ReasoningOn Unified Stream Reasoning
On Unified Stream Reasoning
 
On Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realmOn Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realm
 
Querying the Web of Data with XSPARQL 1.1
Querying the Web of Data with XSPARQL 1.1Querying the Web of Data with XSPARQL 1.1
Querying the Web of Data with XSPARQL 1.1
 
Augmented Participation to Live Events through Social Network Content Enrichm...
Augmented Participation to Live Events through Social Network Content Enrichm...Augmented Participation to Live Events through Social Network Content Enrichm...
Augmented Participation to Live Events through Social Network Content Enrichm...
 
An experience on empirical research about rdf stream
An experience on empirical research about rdf streamAn experience on empirical research about rdf stream
An experience on empirical research about rdf stream
 
RDF Stream Processing Models (RSP2014)
RDF Stream Processing Models (RSP2014)RDF Stream Processing Models (RSP2014)
RDF Stream Processing Models (RSP2014)
 
A Survey of Temporal Extensions of Description Logics
A Survey of Temporal Extensions of Description LogicsA Survey of Temporal Extensions of Description Logics
A Survey of Temporal Extensions of Description Logics
 
IMaRS - Incremental Materialization for RDF Streams (SR4LD2013)
IMaRS - Incremental Materialization for RDF Streams (SR4LD2013)IMaRS - Incremental Materialization for RDF Streams (SR4LD2013)
IMaRS - Incremental Materialization for RDF Streams (SR4LD2013)
 
RDF Stream Processing Models (SR4LD2013)
RDF Stream Processing Models (SR4LD2013)RDF Stream Processing Models (SR4LD2013)
RDF Stream Processing Models (SR4LD2013)
 
On correctness in RDF stream processor benchmarking
On correctness in RDF stream processor benchmarkingOn correctness in RDF stream processor benchmarking
On correctness in RDF stream processor benchmarking
 
An Ontological Formulation and an OPM profile for Causality in Planning Appli...
An Ontological Formulation and an OPM profile for Causality in Planning Appli...An Ontological Formulation and an OPM profile for Causality in Planning Appli...
An Ontological Formulation and an OPM profile for Causality in Planning Appli...
 
P&MSP2012 - Maven
P&MSP2012 - MavenP&MSP2012 - Maven
P&MSP2012 - Maven
 
P&MSP2012 - Version Control Systems
P&MSP2012 - Version Control SystemsP&MSP2012 - Version Control Systems
P&MSP2012 - Version Control Systems
 
P&MSP2012 - Unit Testing
P&MSP2012 - Unit TestingP&MSP2012 - Unit Testing
P&MSP2012 - Unit Testing
 

Recently uploaded

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 

Recently uploaded (20)

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 

Distributed stream consistency checking

  • 1. Distributed Stream Consistency Checking Shen Gao, Daniele Dell’Aglio, Jeff Z. Pan and Abraham Bernstein Cáceres, Spain, 08.06.2018 Carlo Bernaschina (presenter)
  • 2. Problem setting ICWE, 08.06.2018Distributed Stream Consistency Checking2/25  Real time processing of huge volumes of dynamic data  Smart cities  News  Knowledge graph
  • 3. The problem of noise ICWE, 08.06.2018Distributed Stream Consistency Checking  Streaming data are often noisy  Broken sensors  Malicious data injection  Measurement errors  How to cope with noise?  Machine learning and numerical analyses to cope with noise in time series  When streams are complex (as Web streams), we want to ensure that they are compliant to a (non-trivial) conceptual model 3/25
  • 4. Research question How to assess the consistency of streams w.r.t. a fixed and known a-priori conceptual model? ICWE, 08.06.2018Distributed Stream Consistency Checking4/25
  • 5. Towards a solution ICWE, 08.06.2018Distributed Stream Consistency Checking  How to model the stream consistency check problem? 5/25
  • 6. How to model the conceptual model? ICWE, 08.06.2018Distributed Stream Consistency Checking  DL-Litecore  The set of PIs and NIs composes a TBoxT Person Student Employee Faculty Admin Positive Inclusion (PI) PhD student Person Organization DJ Negative Inclusion (NI) 6/25
  • 7. How to model the data? ICWE, 08.06.2018Distributed Stream Consistency Checking  ABox axioms associate:  Individuals to classes  Shen is a  University of Zurich is a  Individuals to other individuals  Shen attends the University of Zurich  Inconsistencies arise when the ontology (TBox + ABox) contains contraditions  Daniele is a  Daniele is a  disjoint PhD student University PhD student University PhD student University 7/25
  • 8. How to model the data stream? ICWE, 08.06.2018Distributed Stream Consistency Checking  Ontology stream  One staticTBox  A sequence of time-annotated ABoxes with the updates  Sliding window over the ontology stream  Captures a recent set of events A1 A3 A5 { Shen is a } 3 5 1 t PhD student { Jeff is a Daniele is a } Employee Student { Avi is a }PhD student TBoxPerson Student Employee Faculty AdminPhD student Organiz. Univers. High school DJ 8/25
  • 9. The stream consistency check problem ICWE, 08.06.2018Distributed Stream Consistency Checking  Given an ontology stream, we want to check if it is consistent w.r.t. a sliding window of a fixed size  At each time instant, we want to check if the events captured by the sliding window are consistent  TheTBox and the current window content compose an ontology A1 A3 A5 { Shen is a } 3 5 1 t PhD student { Jeff is a Daniele is a } University Student { Jeff is a }PhD student TBoxPerson Student Employee Faculty AdminPhD student Organiz. Univers. High school DJ 9/25
  • 10. Towards a solution ICWE, 08.06.2018Distributed Stream Consistency Checking  How to model the stream consistency check problem?  Description logics, ontology streams  How to cope with a huge amount of streaming data? 10/25
  • 11. Scalability ICWE, 08.06.2018Distributed Stream Consistency Checking How to cope with the problem when the data volume is big?  Sliding windows  The content of the window may still be too large to be processed online  Distribution of the stream consistency checking process  We build our solution on top of a Distributed Stream Processing Engine (DSPE)  We adopt the Storm terminology to introduce the main concepts, but they are common to other DSPEs 11/25
  • 12. DSPE concepts ICWE, 08.06.2018Distributed Stream Consistency Checking S B1 B2 B1 B2S B1 B1 B2 Logical topology Physical topology BoltsSpout Node 1 Node 2 Node 3 Tuples 12/25
  • 13. Towards a solution ICWE, 08.06.2018Distributed Stream Consistency Checking  How to model the stream consistency check problem?  Description logics, ontology streams  How to cope with a huge amount of streaming data?  Distributed stream processing engines  How to perform stream consistency checking over DSPE? 13/25
  • 14. The NI closure ICWE, 08.06.2018Distributed Stream Consistency Checking  Given theTBox T, it is possible to compute all the possible Negative Inclusion axioms  The set of all the possible NI axioms is named NI closure Person Student Employee Faculty AdminPhD student DJ Organization University Company 14/25
  • 15. B1 The NIs Topology Method (NTM) ICWE, 08.06.2018Distributed Stream Consistency Checking  The resulting topology is the following  A bolt evaluates when the disjoint axioms in the NI closure are satisfied  Each axiom is encoded as a conjunction operation S B1 15/25 Daniele is a Person Inconsistency Daniele is a University o1 Shen is a Company Inconsistency Shen is a Student o2
  • 16. Improving NTM ICWE, 08.06.2018Distributed Stream Consistency Checking  Drawback of NTM  The NI closure size can be exponential to the size of theTBox  The bolt B1 becomes the bottleneck of the topology  Introduction of inference operations to reduce the number of conjunction operations 16/25 o Daniele is a Student Daniele is a Person S B1
  • 17. Improving NTM - intuition ICWE, 08.06.2018Distributed Stream Consistency Checking Person Student Employee DJ Organization University Company 9 NIs S B1 B2 Student -> Person Employee -> Person Company -> Organization University -> Organization S B1 1 NI 17/25
  • 18. The Pipeline Topology Method (LN) ICWE, 08.06.2018Distributed Stream Consistency Checking DJ(Person,Publication) DJ(Student,Publication) DJ(Student,Employee) DJ(Article,Student) DJ(Person,Organization) ... Computes the NI closure DJ(Person,Publication) DJ(Student,Employee) DJ(Person,Organization) ... Identifies the essential NIs Groups and orders the essential NIs 18/25
  • 19. The Pipeline Topology Method (LN) cont’d ICWE, 08.06.2018Distributed Stream Consistency Checking Groups are assigned to bolts This step has a major impact on performance! Less NIs w.r.t. NTM 19/25
  • 20. Towards a solution ICWE, 08.06.2018Distributed Stream Consistency Checking  How to model the stream consistency check problem?  Description logics, ontology streams  How to cope with a huge amount of streaming data?  Distributed stream processing engines  How to perform stream consistency checking over DSPE?  NTM, LN  How to they perform? 20/25
  • 21. Setup ICWE, 08.06.2018Distributed Stream Consistency Checking  Ontologies  LUBM  56 PIs, 70 NIs  NPD  332 PIs, 51 Nis  Six machines  128GB ram  2 E5-2680 v2 processors (10 cores per processor)  Twitter Heron 0.14.3 21/25
  • 22. Comparing NTM and LN ICWE, 08.06.2018Distributed Stream Consistency Checking S B1 B2 LN-x: x NI groups Half of the nodes assigned to check consistency Similar results LN-2 outperforms NTM up to 139% The load on the first node increases 22/25
  • 23. Investigating the results ICWE, 08.06.2018Distributed Stream Consistency Checking LN LN LN LN LN LN LN LN LN LN LN LN LN LN LN LN LN LN LN LN NTM 23/25
  • 24. Conclusions ICWE, 08.06.2018Distributed Stream Consistency Checking  It is possible to perform consistency checking over high volumes of data streams  We developed two methods (NTM and LN) and studied their performance  More than 14 million tuples/minute  LN can outperform NTM up to 300%  What’s next  Towards more expressive ontological languages  Repairing inconsistencies  Implementation and testing over other DPSEs 24/25
  • 25. Thank you! Questions? Distributed Stream Consistency Checking Shen Gao, Daniele Dell’Aglio, Jeff Z. Pan,Abraham Bernstein ICWE, 08.06.2018Distributed Stream Consistency Checking25/25