SlideShare a Scribd company logo
1 of 47
Download to read offline
Querying incomplete data
Cristina Sirangelo
LSV, ENS-Cachan
joint work with Amélie Gheerbrant and Leonid Libkin
March 12th 2015
➡ Introduction
➡ Incomplete data model
➡ Querying incomplete data
‣ the key to tractability: naïve evaluation
➡ What makes naïve evaluation work?
‣ a general framework
➡ Applicability of the framework
➡ Moving forward
Plan
Data incompleteness: missing/unknown data values, partially available data, ...
Reasons:
mistakes: wrong/missing entries
restrictions on data access
data heterogeneity: data exchange/integration
Incomplete data
Querying incomplete data
Semantics of query answering
‣ How should the result of a query be defined
in the presence of incompleteness?
Query evaluation
‣ How do we evaluate a query on
an incomplete database?
‣ Can this be done efficiently ?
?
Q: which employees are managers?
incomplete database
Employee
Green
?
Brown
ManagerManager
Green ?
? Brown
Green ?
Incompleteness in theory and practice
Incompleteness in database systems
‣ Semantics of query answering: poorly designed
‣ Query evaluation: very efficient, optimized query engines
eg:
• In SQL the following are consistent statements for sets X,Y
|X| > |Y| and X−Y=∅
• This may occur if Y contains incomplete information (SQL nulls)
Incompleteness in theory and practice
Incompleteness in database systems
‣ Semantics of query answering: poorly designed
‣ Query evaluation: very efficient, optimized query engines
Theoretical framework for incompleteness
[Imielinski-Lipski, Abiteboul-Kanellakis-Grahne, etc. 80’s]
‣ Semantics of query answering: clean framework, suitable semantics
‣ Query evaluation: hard
Incompleteness in theory and practice
Bridging the gap between theory and systems:
answer queries correctly, use classical query engines
not satisfactorily addressed even in the simplest data model
Incompleteness in database systems
‣ Semantics of query answering: poorly designed
‣ Query evaluation: very efficient, optimized query engines
Theoretical framework for incompleteness
[Imielinski-Lipski, Abiteboul-Kanellakis-Grahne, etc. 80’s]
‣ Semantics of query answering: clean framework, suitable semantics
‣ Query evaluation: hard
➡ Introduction
➡ Incomplete data model
➡ Querying incomplete data
‣ the key to tractability: naïve evaluation
➡ What makes naïve evaluation work?
‣ a general framework
➡ Applicability of the framework
➡ Moving forward
Plan
Incomplete relational data
Complete instance: over Const
dom(I) : the subset of Const ∪ Var occurring in I
Employee
Green
x1
Brown
ManagerManager
Green x1
x1 Brown
Green x2
Const : a countably infinite set of constants
Var : a countably infinite set of variables
(nulls)
I
Incomplete database instance (naïve table) [Imielinski, Lipski ’84]:
database instance over domain Const ∪ Var
⟦I⟧
I
Any incomplete database represents a set of complete databases (possible worlds)
Semantics of incompletness:
a function ⟦ ⟧ associating with each incomplete database a set of complete databases
Semantics of incompleteness
Semantics of incompleteness
Three well known relational semantics:
‣ OWA (OpenWorld Assumption) [Imielinski-Lipski ’84]
‣ CWA (ClosedWorld Assumption) [Reiter ’77, Imielinski-Lipski ’84]
‣ WCWA (Weak ClosedWorld Assumption) [Reiter ’77]
I
Employee
Green
x1
Brown
ManagerManager
Green x1
x1 Brown
Green x2
Semantics of incompleteness
OWA:
Green
White
Brown
Green White
White Brown
Green Black
Green
Brown
Green Brown
Brown Brown
Black Brown
Green
White
Brown
Black
Smith
Green White
White Brown
Green Black
...
⟦I⟧OWA
v: valuation
Interpretation of incompleteness:
• missing data values, missing tuples
x1=White x2=Black
x1=x2=Brown
x1=W
hite
x2=Black
Employee
Green
x1
Brown
ManagerManager
Green x1
x1 Brown
Green x2
⟦I⟧OWA = { D over Const | D ⊇ v(I)
for some v: Var → Const }
I
Semantics of incompleteness
Green Brown
Brown Brown
Employee
Green
x1
Brown
ManagerManager
Green x1
x1 Brown
Green x2
I
CWA:
⟦I⟧CWA = { D over Const | D = v(I)
for some v: Var → Const }
Interpretation of incompleteness:
• missing data values
• no missing tuples
Green
White
Brown
Green White
White Brown
Green Black
x1=White x2=Black
x1=x2=Brown
x1=W
hite
x2=Black
Green
Brown
Green White
White Brown
Green Black
Green
White
Brown
...
⟦I⟧CWA
Semantics of incompleteness
Green
Brown
Green Brown
Brown Brown
Green Green
⟦I⟧WCWA
WCWA:
⟦I⟧WCWA ={D over Const |
D ⊇ v(I), dom(D)=dom(v(I))
for some v: Var → Const}
Interpretation of incompleteness:
• missing data values, missing tuples
• no missing domain elements
Employee
Green
x1
Brown
ManagerManager
Green x1
x1 Brown
Green x2
I
x1=White x2=Black
x1=x2=Brown
x1=W
hite
x2=Black
Green
White
Brown
Green White
White Brown
Green Black
...
Green
White
Brown
Black
Green White
White Brown
Green Black
Green Brown
Plan
➡ Introduction
➡ Incomplete data model
➡ Querying incomplete data
‣ the key to tractability: naïve evaluation
➡ What makes naïve evaluation work?
‣ a general framework
➡ Applicability of the framework
➡ Moving forward
Semantics of query answering: certain answers
Q (Boolean)
I
Querying incomplete databases
⟦I⟧
Certain answers
......
Green
White
Brown
Green White
White Brown
Green Black
under either CWA, OWA and WCWA
Employee
Green
x1
Brown
ManagerManager
Green x1
x1 Brown
Green x2
I
Example Q : “There is a manager who has a manager”
certQ(I) = true ⟦I⟧CWA / OWA / WCWA
Need to use the available (incomplete) data
Computing certain answers on I : usually hard
‣ from coNP-complete to undecidable for FO [Imielinski-Lipski ’84, Abiteboul et al ’91]
certQ(I)
I
Q
Computing certain answers
⟦I⟧
for all I
Naïve evaluation
Q(I) = certQ(I)
Q
Naïve evaluation works for Q :
Q(I) : Q evaluated directly on I,
as if variables were new distinct constants
(xi ≠ xj for i ≠j
xi ≠ c for all c ∈ Const )
I
Q
⟦I⟧
Naïve evaluation
...
Green
White
Brown
Green White
White Brown
Green Black
under CWA, OWA and WCWA
Employee
Green
x1
Brown
ManagerManager
Green x1
x1 Brown
Green x2
I
Example Q : “There is a manager who has a manager”
certQ(I) = true
Q(I) = true
⟦I⟧CWA / OWA / WCWA
Naïve evaluation
...
Example Q : “There is a manager who has a manager”
Employee ManagerManager
I
⟦I⟧CWA / OWA / WCWA
Generalizing:
Q(I) = certQ(I) for all I
naïve evaluation works for Q
under CWA, OWA and WCWA
Naïve evaluation: a model-checking problem (checking I ⊨ Q) EFFICIENT
‣ PTIME in the size of the instance for FO queries
‣ based on classical query evaluation algorithms of database engines
‣ can benefit from query optimization techniques
Certain answers: an entailment problem (checking that I entails Q) HARD
Naïve evaluation in theory and practice
Naïve evaluation works
correct query answering semantics, classical query evaluation algorithms /
entailment reduces to (straightforward) model-checking
clearly not always possible ! (undecidable vs. PTIME)
⟦I⟧OWA/WCWA
...
Naïve evaluation does not always work
Green
White
Brown
Black
Green White
White Brown
Brown Black
I
Employee
Green
x1
Brown
ManagerManager
Green x1
x1 Brown
Brown x2
D
Q(I) = true
certQ(I) = false under OWA and WCWA
naïve evaluation does not work for Q under OWA and WCWA
A concrete example Q:“All employees are managers”
➡ Introduction
➡ Incomplete data model
➡ Querying incomplete data
‣ the key to tractability: naïve evaluation
➡ What makes naïve evaluation work?
‣ a general framework
➡ Applicability of the framework
➡ Moving forward
Plan
Relating naïve evaluation and syntactic fragments
A unified framework for relating naïve evaluation and syntactic fragments for
several possible semantics:
Naïve evaluation works
for Q under [[]]
Q is “monotone”
w.r.t. [[]]
Q is preserved under a
class of homomorphisms
Q is expressible in a
syntactic fragment
Preservation
theorems
Naïve evaluation and monotonicity
Shown in a very general setting subsuming every
data model / semantics of incompleteness
(even beyond relational databases)
Naïve evaluation works
for Q under [[]]
Q is “monotone”
w.r.t. [[]]
Q is preserved under a
class of homomorphisms
Q is expressible in a
syntactic fragment
Preservation
theorems
Naïve evaluation and monotonicity
Database domain: a quadruple〈 D, C, ⟦ ⟧, ≈ 〉
description example
D : a set
database objects
(complete and incomplete)
all naïve tables over a fixed
schema σ
C : a subset of D complete database objects all complete instances over σ
⟦ ⟧ : D → 2C semantics of incompleteness ⟦ ⟧OWA , ⟦ ⟧CWA , etc.
≈ : an equivalence
relation on D
equivalence of objects (w.r.t.
queries)
isomorphism of relational
instances
Naïve evaluation and monotonicity
Over 〈 D, C , ⟦ ⟧ , ≈ 〉
Boolean query: Q : D → {true, false}
generic : x ≈ y implies Q(x)=Q(y)
monotone w.r.t. ⟦ ⟧ : y ∈ ⟦x⟧ implies Q(x) Q(y)
Certain answers for x ∈ D : cert(Q, x) = ⋀c ∈ ⟦x⟧ Q(c)
Naïve evaluation works for Q : For all x ∈ D Q(x) = cert(Q, x)
Naïve evaluation and monotonicity
Saturation property for〈 D, C , ⟦ ⟧ , ≈ 〉:
For all x ∈ D there exists y ∈ ⟦x⟧ y ≈ x
Over a saturated database domain,
if Q is a generic Boolean query:
Naïve evaluation works for Q iff
Q is monotone w.r.t. ⟦ ⟧
holds for most common semantics
Naïve evaluation works
for Q under [[]]
Q is “monotone”
w.r.t. [[]]
Q is preserved under a
class of homomorphisms
Q is expressible in a
syntactic fragment
Preservation
theorems
Monotonicity and preservation
Naïve evaluation works
for Q under [[]]
Q is “monotone”
w.r.t. [[]]
Q is preserved under a
class of homomorphisms
Q is expressible in a
syntactic fragment
Preservation
theorems
Many variants
onto homomorphism: dom(D’) = h( dom(D) )
strong onto homomorphism: D’=h(D)
Monotonicity and preservation
homomorphism D → D’ : a mapping h: dom(D) → dom(D’) s.t. h(D) ⊆ D’
a b
a a
c c
d d
a,b → c
D D’
Q preserved under homomorphism:
D → D’ implies Q(D) Q(D’) for all D
Monotonicity and preservation
Naïve evaluation works
for Q under [[]]
Q is “monotone”
w.r.t. [[]]
Q is preserved under a
class of homomorphisms
Q is expressible in a
syntactic fragment
Preservation
theorems
monotonicity w.r.t different semantics
preservation under different notions of
homomorphism
Preservation and syntactic fragments
Naïve evaluation works
for Q under [[]]
Q is “monotone”
w.r.t. [[]]
Q is preserved under a
class of homomorphisms
Q is expressible in a
syntactic fragment
Preservation
theorems
Preservation theorems
‣ syntactic characterizations of preservation
properties of queries in a given logic
‣ classical results in (finite) model theory
Preservation and syntactic fragments
Homomorphism Preservation Theorem: over arbitrary relational structures
an FO query Q is preserved under homomorphism iff Q is equivalent to a UCQ
‣ recently proved over finite structures [Rossman ’08]
Equally expressive:
‣ UCQ (Unions of Conjunctive Queries)
‣ SPJU algebra
‣ ∃Pos , i.e. {∃, ∧, ∨}-FO
Preservation and syntactic fragments
Lyndon Positivity Theorem [Lyndon ’59]: over arbitrary relational structures
an FO query Q is preserved under onto homomorphism iff Q is expressible in Pos
‣ fails in the finite [Ajtai-Gurevich 87, Rosen ’95, Stolboushkin ’95]
Pos: {∀,∃,∧, ∨}-FO
Homomorphism Preservation Theorem: over arbitrary relational structures
an FO query Q is preserved under homomorphism iff Q is equivalent to a UCQ
‣ recently proved over finite structures [Rossman ’08]
Preservation and syntactic fragments
Preservation theorems:
(Syntax Preservation) holds in the finite as well
classes of queries where naïve evaluation works
Naïve evaluation works
for Q under [[]]
Q is “monotone”
w.r.t. [[]]
Q is preserved under a
class of homomorphisms
Q is expressible in a
syntactic fragment
Preservation
theorems
Naïve evaluation and syntactic fragments
Three well known semantics as instances of our framework
Naïve evaluation works under:
OWA
Preservation under
homomorphism
∃Pos
WCWA
Preservation
under onto
homomorphism
Pos
CWA
Preservation
under strong onto
homomorphism
Pos+∀G
[Rossman] [Lyndon] [Gheerbrant,
Libkin, S.]
Naïve evaluation works
for Q under [[]]
Q is “monotone”
w.r.t. [[]]
Q is preserved under a
class of homomorphisms
Q is expressible in a
syntactic fragment
Preservation
theorems
• Preservation under strong onto homomorphisms:
‣ There is a preservation result in the infinite [Keisler ‘65]
‣ complex syntactic restrictions, one binary relation only, problematic to extend...
• A new sufficient condition for preservation, with a good syntax:
Pos+∀G : extends Pos with a limited form of negation (universal guards)
Positive fragment with Universal Guards
' := ⇤ | ⌅ | R(¯x) | x = y | ' ⇧ ' | ' ⌃ ' | ⇥x' | x' |
0
@
8¯x ( G(¯x) ! ) with
G : a relation or equality symbol
¯x : a tuple of distinct variables
1
A
Pos+∀G roughly corresponds to SPJU algebra + division
Examples revisited
Q : “There is a manager who has a manager” Pos
Pos+∀G
∃Pos
Q
naïve evaluation works for Q under CWA OWA, WCWA
Examples revisited
Q : “All employees are managers”
Q
Pos
Pos+∀G
∃Posnaïve evaluation works for Q under CWA
(recall: not true under OWA, nor under WCWA)
Naïve evaluation works well beyond ∃Pos under other semantics than OWA
➡ Introduction
➡ Incomplete data model
➡ Querying incomplete data
‣ the key to tractability: naïve evaluation
➡ What makes naïve evaluation work?
‣ a general framework
➡ Applicability of the framework
➡ Moving forward
Plan
Beyond OWA, CWA and WCWA
Semantics of incompleteness have been considered in several contexts:
‣ programming semantics, logic programming, data exchange,...
[Minker’82, Ohori’90, Rounds’91, Libkin’95, Hernich’11]
➡ powerset semantics
➡ minimal semantics
Beyond OWA, CWA and WCWA
Naïve evaluation works under:
Powerset semantics
Preservation under
unions of strong onto
homomorphisms
∃Pos+∀Gbool
Minimal semantics
Preservation under
unions of minimal
homomorphisms
over cores
∃Pos+∀Gbool
Naïve evaluation works
for Q under [[]]
Q is “monotone”
w.r.t. [[]]
Q is preserved under a
class of homomorphisms
Q is expressible in a
syntactic fragment
Preservation
theorems
Beyond the relational data model
Incomplete XML based on a form of tree patterns [Barcelo-Libkin-Poggi-S. ’10]
_ *
*
library
authorauthor author
[name= x ]
[title= y ]
[name= x]
‣ missing data values
‣ missing nodes
‣ missing structural information
✓ labels
✓ parent-child, next-sibling
relationships
✓ etc.
Tree-pattern-queries: the analog of UCQs on trees
[name=
“Abiteboul” ]
Beyond the relational data model
The analog of naïve evaluation works for tree-pattern-queries under OWA on
rigid tree patterns [Barcelo-Libkin-Poggi-S. ’10]
‣ rigidity: essentially avoids structural incompleteness
Our framework explains this result:
‣ database domain:
✓ the set of complete/incomplete trees,
✓ OWA semantics: homomorphism-based
‣ tree pattern queries are preserved under homomorphisms of trees
‣ rigidity ensures the saturation property
Moving forward
Naïve evaluation on combinations of data models/semantics, e.g
➡ tree-structured/CWA
➡ graph-structured data
Query languages beyond FO
➡ fixed-point logics, fragments of SO, etc.
Naïve evaluation over restricted instances
➡ Applications: data integration/exchange
Beyond naïve-evaluation
➡ rewriting of the query/instance
(classical in ontology-based query answering)
Thank you!

More Related Content

Similar to Querying incomplete data

林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
台灣資料科學年會
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
Turi, Inc.
 

Similar to Querying incomplete data (20)

Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - Slides
 
DIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slidesDIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slides
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 
Data analytics, a (short) tour
Data analytics, a (short) tourData analytics, a (short) tour
Data analytics, a (short) tour
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
A field guide the machine learning zoo
A field guide the machine learning zoo A field guide the machine learning zoo
A field guide the machine learning zoo
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
Using Technology in Data Analysis
Using Technology in Data AnalysisUsing Technology in Data Analysis
Using Technology in Data Analysis
 
Bayesian methods in reliability engineering
Bayesian methods in reliability engineeringBayesian methods in reliability engineering
Bayesian methods in reliability engineering
 
Query Recommendation - Barcelona 2017
Query Recommendation - Barcelona 2017Query Recommendation - Barcelona 2017
Query Recommendation - Barcelona 2017
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
A General Overview of Machine Learning
A General Overview of Machine LearningA General Overview of Machine Learning
A General Overview of Machine Learning
 
The Problem with Traditional Analytics I: Dark Data
The Problem with Traditional Analytics I: Dark DataThe Problem with Traditional Analytics I: Dark Data
The Problem with Traditional Analytics I: Dark Data
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Twdatasci cjlin-big data analytics - challenges and opportunities
Twdatasci cjlin-big data analytics - challenges and opportunitiesTwdatasci cjlin-big data analytics - challenges and opportunities
Twdatasci cjlin-big data analytics - challenges and opportunities
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdf
 
Distributed processing of large graphs in python
Distributed processing of large graphs in pythonDistributed processing of large graphs in python
Distributed processing of large graphs in python
 

More from INRIA-OAK

Change Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic WebChange Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic Web
INRIA-OAK
 
ANGIE in wonderland
ANGIE in wonderlandANGIE in wonderland
ANGIE in wonderland
INRIA-OAK
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimation
INRIA-OAK
 

More from INRIA-OAK (20)

Change Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic WebChange Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic Web
 
ANGIE in wonderland
ANGIE in wonderlandANGIE in wonderland
ANGIE in wonderland
 
On building more human query answering systems
On building more human query answering systemsOn building more human query answering systems
On building more human query answering systems
 
Dynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsDynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data Platforms
 
Web Data Management in RDF Age
Web Data Management in RDF AgeWeb Data Management in RDF Age
Web Data Management in RDF Age
 
Oak meeting 18/09/2014
Oak meeting 18/09/2014Oak meeting 18/09/2014
Oak meeting 18/09/2014
 
Nautilus
NautilusNautilus
Nautilus
 
Warg
WargWarg
Warg
 
Vip2p
Vip2pVip2p
Vip2p
 
S4
S4S4
S4
 
Rdf saturator
Rdf saturatorRdf saturator
Rdf saturator
 
Rdf generator
Rdf generatorRdf generator
Rdf generator
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimation
 
rdf query reformulation
rdf query reformulationrdf query reformulation
rdf query reformulation
 
postgres loader
postgres loaderpostgres loader
postgres loader
 
Plreuse
PlreusePlreuse
Plreuse
 
Paxquery
PaxqueryPaxquery
Paxquery
 
Conjunctive queries
Conjunctive queriesConjunctive queries
Conjunctive queries
 
CliqueSquare processing
CliqueSquare processingCliqueSquare processing
CliqueSquare processing
 
Clique square storage
Clique square storageClique square storage
Clique square storage
 

Recently uploaded

development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 

Recently uploaded (20)

Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 

Querying incomplete data

  • 1. Querying incomplete data Cristina Sirangelo LSV, ENS-Cachan joint work with Amélie Gheerbrant and Leonid Libkin March 12th 2015
  • 2. ➡ Introduction ➡ Incomplete data model ➡ Querying incomplete data ‣ the key to tractability: naïve evaluation ➡ What makes naïve evaluation work? ‣ a general framework ➡ Applicability of the framework ➡ Moving forward Plan
  • 3. Data incompleteness: missing/unknown data values, partially available data, ... Reasons: mistakes: wrong/missing entries restrictions on data access data heterogeneity: data exchange/integration Incomplete data
  • 4. Querying incomplete data Semantics of query answering ‣ How should the result of a query be defined in the presence of incompleteness? Query evaluation ‣ How do we evaluate a query on an incomplete database? ‣ Can this be done efficiently ? ? Q: which employees are managers? incomplete database Employee Green ? Brown ManagerManager Green ? ? Brown Green ?
  • 5. Incompleteness in theory and practice Incompleteness in database systems ‣ Semantics of query answering: poorly designed ‣ Query evaluation: very efficient, optimized query engines eg: • In SQL the following are consistent statements for sets X,Y |X| > |Y| and X−Y=∅ • This may occur if Y contains incomplete information (SQL nulls)
  • 6. Incompleteness in theory and practice Incompleteness in database systems ‣ Semantics of query answering: poorly designed ‣ Query evaluation: very efficient, optimized query engines Theoretical framework for incompleteness [Imielinski-Lipski, Abiteboul-Kanellakis-Grahne, etc. 80’s] ‣ Semantics of query answering: clean framework, suitable semantics ‣ Query evaluation: hard
  • 7. Incompleteness in theory and practice Bridging the gap between theory and systems: answer queries correctly, use classical query engines not satisfactorily addressed even in the simplest data model Incompleteness in database systems ‣ Semantics of query answering: poorly designed ‣ Query evaluation: very efficient, optimized query engines Theoretical framework for incompleteness [Imielinski-Lipski, Abiteboul-Kanellakis-Grahne, etc. 80’s] ‣ Semantics of query answering: clean framework, suitable semantics ‣ Query evaluation: hard
  • 8. ➡ Introduction ➡ Incomplete data model ➡ Querying incomplete data ‣ the key to tractability: naïve evaluation ➡ What makes naïve evaluation work? ‣ a general framework ➡ Applicability of the framework ➡ Moving forward Plan
  • 9. Incomplete relational data Complete instance: over Const dom(I) : the subset of Const ∪ Var occurring in I Employee Green x1 Brown ManagerManager Green x1 x1 Brown Green x2 Const : a countably infinite set of constants Var : a countably infinite set of variables (nulls) I Incomplete database instance (naïve table) [Imielinski, Lipski ’84]: database instance over domain Const ∪ Var
  • 10. ⟦I⟧ I Any incomplete database represents a set of complete databases (possible worlds) Semantics of incompletness: a function ⟦ ⟧ associating with each incomplete database a set of complete databases Semantics of incompleteness
  • 11. Semantics of incompleteness Three well known relational semantics: ‣ OWA (OpenWorld Assumption) [Imielinski-Lipski ’84] ‣ CWA (ClosedWorld Assumption) [Reiter ’77, Imielinski-Lipski ’84] ‣ WCWA (Weak ClosedWorld Assumption) [Reiter ’77] I Employee Green x1 Brown ManagerManager Green x1 x1 Brown Green x2
  • 12. Semantics of incompleteness OWA: Green White Brown Green White White Brown Green Black Green Brown Green Brown Brown Brown Black Brown Green White Brown Black Smith Green White White Brown Green Black ... ⟦I⟧OWA v: valuation Interpretation of incompleteness: • missing data values, missing tuples x1=White x2=Black x1=x2=Brown x1=W hite x2=Black Employee Green x1 Brown ManagerManager Green x1 x1 Brown Green x2 ⟦I⟧OWA = { D over Const | D ⊇ v(I) for some v: Var → Const } I
  • 13. Semantics of incompleteness Green Brown Brown Brown Employee Green x1 Brown ManagerManager Green x1 x1 Brown Green x2 I CWA: ⟦I⟧CWA = { D over Const | D = v(I) for some v: Var → Const } Interpretation of incompleteness: • missing data values • no missing tuples Green White Brown Green White White Brown Green Black x1=White x2=Black x1=x2=Brown x1=W hite x2=Black Green Brown Green White White Brown Green Black Green White Brown ... ⟦I⟧CWA
  • 14. Semantics of incompleteness Green Brown Green Brown Brown Brown Green Green ⟦I⟧WCWA WCWA: ⟦I⟧WCWA ={D over Const | D ⊇ v(I), dom(D)=dom(v(I)) for some v: Var → Const} Interpretation of incompleteness: • missing data values, missing tuples • no missing domain elements Employee Green x1 Brown ManagerManager Green x1 x1 Brown Green x2 I x1=White x2=Black x1=x2=Brown x1=W hite x2=Black Green White Brown Green White White Brown Green Black ... Green White Brown Black Green White White Brown Green Black Green Brown
  • 15. Plan ➡ Introduction ➡ Incomplete data model ➡ Querying incomplete data ‣ the key to tractability: naïve evaluation ➡ What makes naïve evaluation work? ‣ a general framework ➡ Applicability of the framework ➡ Moving forward
  • 16. Semantics of query answering: certain answers Q (Boolean) I Querying incomplete databases ⟦I⟧
  • 17. Certain answers ...... Green White Brown Green White White Brown Green Black under either CWA, OWA and WCWA Employee Green x1 Brown ManagerManager Green x1 x1 Brown Green x2 I Example Q : “There is a manager who has a manager” certQ(I) = true ⟦I⟧CWA / OWA / WCWA
  • 18. Need to use the available (incomplete) data Computing certain answers on I : usually hard ‣ from coNP-complete to undecidable for FO [Imielinski-Lipski ’84, Abiteboul et al ’91] certQ(I) I Q Computing certain answers ⟦I⟧
  • 19. for all I Naïve evaluation Q(I) = certQ(I) Q Naïve evaluation works for Q : Q(I) : Q evaluated directly on I, as if variables were new distinct constants (xi ≠ xj for i ≠j xi ≠ c for all c ∈ Const ) I Q ⟦I⟧
  • 20. Naïve evaluation ... Green White Brown Green White White Brown Green Black under CWA, OWA and WCWA Employee Green x1 Brown ManagerManager Green x1 x1 Brown Green x2 I Example Q : “There is a manager who has a manager” certQ(I) = true Q(I) = true ⟦I⟧CWA / OWA / WCWA
  • 21. Naïve evaluation ... Example Q : “There is a manager who has a manager” Employee ManagerManager I ⟦I⟧CWA / OWA / WCWA Generalizing: Q(I) = certQ(I) for all I naïve evaluation works for Q under CWA, OWA and WCWA
  • 22. Naïve evaluation: a model-checking problem (checking I ⊨ Q) EFFICIENT ‣ PTIME in the size of the instance for FO queries ‣ based on classical query evaluation algorithms of database engines ‣ can benefit from query optimization techniques Certain answers: an entailment problem (checking that I entails Q) HARD Naïve evaluation in theory and practice Naïve evaluation works correct query answering semantics, classical query evaluation algorithms / entailment reduces to (straightforward) model-checking clearly not always possible ! (undecidable vs. PTIME)
  • 23. ⟦I⟧OWA/WCWA ... Naïve evaluation does not always work Green White Brown Black Green White White Brown Brown Black I Employee Green x1 Brown ManagerManager Green x1 x1 Brown Brown x2 D Q(I) = true certQ(I) = false under OWA and WCWA naïve evaluation does not work for Q under OWA and WCWA A concrete example Q:“All employees are managers”
  • 24. ➡ Introduction ➡ Incomplete data model ➡ Querying incomplete data ‣ the key to tractability: naïve evaluation ➡ What makes naïve evaluation work? ‣ a general framework ➡ Applicability of the framework ➡ Moving forward Plan
  • 25. Relating naïve evaluation and syntactic fragments A unified framework for relating naïve evaluation and syntactic fragments for several possible semantics: Naïve evaluation works for Q under [[]] Q is “monotone” w.r.t. [[]] Q is preserved under a class of homomorphisms Q is expressible in a syntactic fragment Preservation theorems
  • 26. Naïve evaluation and monotonicity Shown in a very general setting subsuming every data model / semantics of incompleteness (even beyond relational databases) Naïve evaluation works for Q under [[]] Q is “monotone” w.r.t. [[]] Q is preserved under a class of homomorphisms Q is expressible in a syntactic fragment Preservation theorems
  • 27. Naïve evaluation and monotonicity Database domain: a quadruple〈 D, C, ⟦ ⟧, ≈ 〉 description example D : a set database objects (complete and incomplete) all naïve tables over a fixed schema σ C : a subset of D complete database objects all complete instances over σ ⟦ ⟧ : D → 2C semantics of incompleteness ⟦ ⟧OWA , ⟦ ⟧CWA , etc. ≈ : an equivalence relation on D equivalence of objects (w.r.t. queries) isomorphism of relational instances
  • 28. Naïve evaluation and monotonicity Over 〈 D, C , ⟦ ⟧ , ≈ 〉 Boolean query: Q : D → {true, false} generic : x ≈ y implies Q(x)=Q(y) monotone w.r.t. ⟦ ⟧ : y ∈ ⟦x⟧ implies Q(x) Q(y) Certain answers for x ∈ D : cert(Q, x) = ⋀c ∈ ⟦x⟧ Q(c) Naïve evaluation works for Q : For all x ∈ D Q(x) = cert(Q, x)
  • 29. Naïve evaluation and monotonicity Saturation property for〈 D, C , ⟦ ⟧ , ≈ 〉: For all x ∈ D there exists y ∈ ⟦x⟧ y ≈ x Over a saturated database domain, if Q is a generic Boolean query: Naïve evaluation works for Q iff Q is monotone w.r.t. ⟦ ⟧ holds for most common semantics Naïve evaluation works for Q under [[]] Q is “monotone” w.r.t. [[]] Q is preserved under a class of homomorphisms Q is expressible in a syntactic fragment Preservation theorems
  • 30. Monotonicity and preservation Naïve evaluation works for Q under [[]] Q is “monotone” w.r.t. [[]] Q is preserved under a class of homomorphisms Q is expressible in a syntactic fragment Preservation theorems
  • 31. Many variants onto homomorphism: dom(D’) = h( dom(D) ) strong onto homomorphism: D’=h(D) Monotonicity and preservation homomorphism D → D’ : a mapping h: dom(D) → dom(D’) s.t. h(D) ⊆ D’ a b a a c c d d a,b → c D D’ Q preserved under homomorphism: D → D’ implies Q(D) Q(D’) for all D
  • 32. Monotonicity and preservation Naïve evaluation works for Q under [[]] Q is “monotone” w.r.t. [[]] Q is preserved under a class of homomorphisms Q is expressible in a syntactic fragment Preservation theorems monotonicity w.r.t different semantics preservation under different notions of homomorphism
  • 33. Preservation and syntactic fragments Naïve evaluation works for Q under [[]] Q is “monotone” w.r.t. [[]] Q is preserved under a class of homomorphisms Q is expressible in a syntactic fragment Preservation theorems Preservation theorems ‣ syntactic characterizations of preservation properties of queries in a given logic ‣ classical results in (finite) model theory
  • 34. Preservation and syntactic fragments Homomorphism Preservation Theorem: over arbitrary relational structures an FO query Q is preserved under homomorphism iff Q is equivalent to a UCQ ‣ recently proved over finite structures [Rossman ’08] Equally expressive: ‣ UCQ (Unions of Conjunctive Queries) ‣ SPJU algebra ‣ ∃Pos , i.e. {∃, ∧, ∨}-FO
  • 35. Preservation and syntactic fragments Lyndon Positivity Theorem [Lyndon ’59]: over arbitrary relational structures an FO query Q is preserved under onto homomorphism iff Q is expressible in Pos ‣ fails in the finite [Ajtai-Gurevich 87, Rosen ’95, Stolboushkin ’95] Pos: {∀,∃,∧, ∨}-FO Homomorphism Preservation Theorem: over arbitrary relational structures an FO query Q is preserved under homomorphism iff Q is equivalent to a UCQ ‣ recently proved over finite structures [Rossman ’08]
  • 36. Preservation and syntactic fragments Preservation theorems: (Syntax Preservation) holds in the finite as well classes of queries where naïve evaluation works Naïve evaluation works for Q under [[]] Q is “monotone” w.r.t. [[]] Q is preserved under a class of homomorphisms Q is expressible in a syntactic fragment Preservation theorems
  • 37. Naïve evaluation and syntactic fragments Three well known semantics as instances of our framework Naïve evaluation works under: OWA Preservation under homomorphism ∃Pos WCWA Preservation under onto homomorphism Pos CWA Preservation under strong onto homomorphism Pos+∀G [Rossman] [Lyndon] [Gheerbrant, Libkin, S.] Naïve evaluation works for Q under [[]] Q is “monotone” w.r.t. [[]] Q is preserved under a class of homomorphisms Q is expressible in a syntactic fragment Preservation theorems
  • 38. • Preservation under strong onto homomorphisms: ‣ There is a preservation result in the infinite [Keisler ‘65] ‣ complex syntactic restrictions, one binary relation only, problematic to extend... • A new sufficient condition for preservation, with a good syntax: Pos+∀G : extends Pos with a limited form of negation (universal guards) Positive fragment with Universal Guards ' := ⇤ | ⌅ | R(¯x) | x = y | ' ⇧ ' | ' ⌃ ' | ⇥x' | x' | 0 @ 8¯x ( G(¯x) ! ) with G : a relation or equality symbol ¯x : a tuple of distinct variables 1 A Pos+∀G roughly corresponds to SPJU algebra + division
  • 39. Examples revisited Q : “There is a manager who has a manager” Pos Pos+∀G ∃Pos Q naïve evaluation works for Q under CWA OWA, WCWA
  • 40. Examples revisited Q : “All employees are managers” Q Pos Pos+∀G ∃Posnaïve evaluation works for Q under CWA (recall: not true under OWA, nor under WCWA) Naïve evaluation works well beyond ∃Pos under other semantics than OWA
  • 41. ➡ Introduction ➡ Incomplete data model ➡ Querying incomplete data ‣ the key to tractability: naïve evaluation ➡ What makes naïve evaluation work? ‣ a general framework ➡ Applicability of the framework ➡ Moving forward Plan
  • 42. Beyond OWA, CWA and WCWA Semantics of incompleteness have been considered in several contexts: ‣ programming semantics, logic programming, data exchange,... [Minker’82, Ohori’90, Rounds’91, Libkin’95, Hernich’11] ➡ powerset semantics ➡ minimal semantics
  • 43. Beyond OWA, CWA and WCWA Naïve evaluation works under: Powerset semantics Preservation under unions of strong onto homomorphisms ∃Pos+∀Gbool Minimal semantics Preservation under unions of minimal homomorphisms over cores ∃Pos+∀Gbool Naïve evaluation works for Q under [[]] Q is “monotone” w.r.t. [[]] Q is preserved under a class of homomorphisms Q is expressible in a syntactic fragment Preservation theorems
  • 44. Beyond the relational data model Incomplete XML based on a form of tree patterns [Barcelo-Libkin-Poggi-S. ’10] _ * * library authorauthor author [name= x ] [title= y ] [name= x] ‣ missing data values ‣ missing nodes ‣ missing structural information ✓ labels ✓ parent-child, next-sibling relationships ✓ etc. Tree-pattern-queries: the analog of UCQs on trees [name= “Abiteboul” ]
  • 45. Beyond the relational data model The analog of naïve evaluation works for tree-pattern-queries under OWA on rigid tree patterns [Barcelo-Libkin-Poggi-S. ’10] ‣ rigidity: essentially avoids structural incompleteness Our framework explains this result: ‣ database domain: ✓ the set of complete/incomplete trees, ✓ OWA semantics: homomorphism-based ‣ tree pattern queries are preserved under homomorphisms of trees ‣ rigidity ensures the saturation property
  • 46. Moving forward Naïve evaluation on combinations of data models/semantics, e.g ➡ tree-structured/CWA ➡ graph-structured data Query languages beyond FO ➡ fixed-point logics, fragments of SO, etc. Naïve evaluation over restricted instances ➡ Applications: data integration/exchange Beyond naïve-evaluation ➡ rewriting of the query/instance (classical in ontology-based query answering)