We reconsider some of the explicit and implicit properties that underlie well-established definitions of data provenance semantics. Previous work on comparing provenance semantics has mostly focused on expressive power (does the provenance generated by a certain semantics subsume the provenance generated by other semantics) and on understanding whether a semantics is insensitive to query rewrite (i.e., do equivalent queries have the same provenance). In contrast, we try to investigate why certain semantics possess specific properties (like insensitivity) and whether these properties are always desirable. We present a new property stability with respect to query language extension that, to the best of our knowledge, has not been isolated and studied on its own.
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
1. Properties Semantics Insensitivity Stability Conclusions
Reexamining some Holy Grails of Provenance
Boris Glavic Renee J. Miller
University of Toronto
UofT:DB Group
TaPP 2011, November 5, 2014
2. Properties Semantics Insensitivity Stability Conclusions
Insensitivity
UofT:DB Group
Insensitivity to Query Rewrite
Equivalent queries have the same provenance
Q Q0 ) P(Q; I ; t) = P(Q0; I ; t)
Holy Grails
3. Properties Semantics Insensitivity Stability Conclusions
Insensitivity
UofT:DB Group
Insensitivity to Query Rewrite
Equivalent queries have the same provenance
Q Q0 ) P(Q; I ; t) = P(Q0; I ; t)
Caveat: Which queries are equivalent?
Set vs. Bag semantics
Query language / Operators
Holy Grails
4. Properties Semantics Insensitivity Stability Conclusions
Stability
UofT:DB Group
Stability with Respect to Query Language Extension
Extend query language with new operators ) no change to
provenance of queries that do not use new operators
Holy Grails
5. Properties Semantics Insensitivity Stability Conclusions
Where
UofT:DB Group
Where [Buneman et al., 2003]
Captures which attribute values in the result of a query have been
copied from which attribute values in the instance.
Representation: P(Attr (I ))
Where: Operator-level syntax-based annotation propagation
IWhere: Insensitive variant: Union of Where for all Q0 with
Q0 Q
Holy Grails
6. Properties Semantics Insensitivity Stability Conclusions
Where
UofT:DB Group
Where
Sensitive, traditionally attributed to being based on query
syntax
Depends on the internal data-
ow inside the query
How values are routed through the query
IWhere
Insensitive by combining Where for all equivalent queries
Counterintuitive eect that if (R; t;A) is in the provenance
then all (R; t0; A) with t:A = t0:A are in the provenance too.
Reason: Can construct equivalent query adding self-join on A
Holy Grails
7. Properties Semantics Insensitivity Stability Conclusions
Where
UofT:DB Group
Example
Qa = R
Qb = A;B(R ./A=C A!C;B!D(R))
R
A B
r1 1 2
r2 1 3
r3 2 3
r4 2 5
Qa Qb
A B
a1 1 2
a2 1 3
a3 2 3
a4 2 5
Provenance
Where(Qa; a1; A) = f(r1;A)g
Where(Qb; a1; A) = f(r1;A); (r2;A)g
IWhere(Qa; a1; A) = IWhere(Qb; a1; A) = f(r1;A); (r2;A)g
Holy Grails
8. Properties Semantics Insensitivity Stability Conclusions
Arguments for Insensitivity
UofT:DB Group
1 Traditionally observed as advantageous in database research
) Tradition not a solid argument
2 External implementation of sensitive semantics. Computing
provenance for a query dierent from the one that will be
executed by DBMS
) No way to solve this, but provenance based on user query
seems to be reasonable
3 Implementation of sensitive semantics in DB-engine limits
optimizer search space
) Insensitive semantics may be harder to compute
) Lack of practical experience
) Realistic sensitivity example?
Holy Grails
9. Properties Semantics Insensitivity Stability Conclusions
Instability of IWhere
UofT:DB Group
IWhere is union of Where for all equivalent queries
Holy Grails
10. Properties Semantics Insensitivity Stability Conclusions
Instability of IWhere
UofT:DB Group
IWhere is union of Where for all equivalent queries
e.g., SPJ and USPJ equivalences are dierent
Holy Grails
11. Properties Semantics Insensitivity Stability Conclusions
Instability of IWhere
UofT:DB Group
IWhere is union of Where for all equivalent queries
e.g., SPJ and USPJ equivalences are dierent
e.g., union Q with a join of Q with some other relation
Holy Grails
12. Properties Semantics Insensitivity Stability Conclusions
Instability of IWhere
UofT:DB Group
IWhere is union of Where for all equivalent queries
e.g., SPJ and USPJ equivalences are dierent
e.g., union Q with a join of Q with some other relation
Let UWhere be IWhere for USPJ queries
Holy Grails
13. Properties Semantics Insensitivity Stability Conclusions
Instability of IWhere
UofT:DB Group
IWhere is union of Where for all equivalent queries
e.g., SPJ and USPJ equivalences are dierent
e.g., union Q with a join of Q with some other relation
Let UWhere be IWhere for USPJ queries
) UWhere an attribute value is annotated with all
annotations from attribute positions that have the same value
Holy Grails
14. Properties Semantics Insensitivity Stability Conclusions
Instability of IWhere
UofT:DB Group
Example
Qa = R
R
A B
r1 1 2
r2 1 3
r3 2 3
r4 2 5
S
C
s1 2
s2 3
Qa Qb
A B
a1 1 2
a2 1 3
a3 2 3
a4 2 5
Provenance
Where(Qa; a3;A) = f(r3; A)g
IWhere(Qa; a3;A) = f(r3; A); (r4; A)g
UWhere(Qa; a3;A) = f(r3; A); (r4; A); (r1;B); (s1; C)g
Holy Grails
15. Properties Semantics Insensitivity Stability Conclusions
Conclusions
UofT:DB Group
Take Away Messages
Be careful how to achieve a property
Insensitivity less applicable to semantics that address internal
data-
ow
Queries with the same external but possibly dierent internal
behaviour have the same provenance
Some Things I'd Like to See
Declarative Semantics ) derive operator-level construction
Semantics model processing, but have a insensitive core
The never-ending quest: Deal with Negation
Other data-models (order)
Holy Grails
16. Questions Overview Properties Semantics Insensitivity
Questions
UofT:DB Group
Semantics
Sound
Complete
Responsible
Insensitive (set)
Insensitive (bag)
Stable
Why
Wit - X - X X X
Why - X - - ? X
IWhy - X X X X X
Where
Where - - - - ? X
IWhere - - - X X -
How - X - - X X
Lineage-based
Lineage X X - - - X
PI-CS X X - - - X
C-CS X - - - - X
Causality - X X X X X
Reexamining some Holy Grails of Provenance
17. Questions Overview Properties Semantics Insensitivity
Semantics Summary
UofT:DB Group
Representation
Used by
P(Attr (I )) Where,
IWhere
P(P(Tuple(I ))) Wit,
Why,
IWhy
N[Tuple(I )] How
f R
1 ; : : : ; R
n j R
i Ri (Q)g Lineage
P(f t1; : : : ; tn j ti 2 Ri (Q) _ ti =?g) PI-CS, C-
CS
P(Tuple(I )) Causality
Reexamining some Holy Grails of Provenance
18. Questions Overview Properties Semantics Insensitivity
Sound, Complete, Responsible
UofT:DB Group
Sound, Complete, Responsible
Sound: Provenance of t produces nothing dierent from t.
t06= t ) t062 Q(P(Q; I ; t))
Reexamining some Holy Grails of Provenance
19. Questions Overview Properties Semantics Insensitivity
Sound, Complete, Responsible
UofT:DB Group
Sound, Complete, Responsible
Sound: Provenance of t produces nothing dierent from t.
t06= t ) t062 Q(P(Q; I ; t))
Caveat: Semantics of evaluating query over provenance
Reexamining some Holy Grails of Provenance
20. Questions Overview Properties Semantics Insensitivity
Sound, Complete, Responsible
UofT:DB Group
Sound, Complete, Responsible
Sound: Provenance of t produces nothing dierent from t.
t06= t ) t062 Q(P(Q; I ; t))
Complete: Provenance of t produces at least t
t 2 Q(P(Q; I ; t))
Reexamining some Holy Grails of Provenance
21. Questions Overview Properties Semantics Insensitivity
Sound, Complete, Responsible
UofT:DB Group
Sound, Complete, Responsible
Sound: Provenance of t produces nothing dierent from t.
t06= t ) t062 Q(P(Q; I ; t))
Complete: Provenance of t produces at least t
t 2 Q(P(Q; I ; t))
Caveat: Semantics of evaluating query over provenance
Reexamining some Holy Grails of Provenance
22. Questions Overview Properties Semantics Insensitivity
Sound, Complete, Responsible
UofT:DB Group
Sound, Complete, Responsible
Sound: Provenance of t produces nothing dierent from t.
t06= t ) t062 Q(P(Q; I ; t))
Complete: Provenance of t produces at least t
t 2 Q(P(Q; I ; t))
Responsible: Every tuple in the provenance of t is necessary
to derive t
Reexamining some Holy Grails of Provenance
23. Questions Overview Properties Semantics Insensitivity
Sound, Complete, Responsible
UofT:DB Group
Sound, Complete, Responsible
Sound: Provenance of t produces nothing dierent from t.
t06= t ) t062 Q(P(Q; I ; t))
Complete: Provenance of t produces at least t
t 2 Q(P(Q; I ; t))
Responsible: Every tuple in the provenance of t is necessary
to derive t
Caveat: . . . from every alternative derivation in the
provenance . . .
) factor provenance into alterative derivations
Caveat: Dierent ways to model that.
E.g., 8t0 2 P(Q; I ; t) : t =2 Q(P(Q; I ; t) ft0g)
Reexamining some Holy Grails of Provenance
24. Questions Overview Properties Semantics Insensitivity
Sound, Complete, Responsible
UofT:DB Group
Sound, Complete, Responsible
Example
Qb = A;B(R ./A=C A!C;B!D(R))
R
A B
r1 1 2
r2 1 3
r3 2 3
r4 2 5
Qb
A B
a1 1 2
a2 1 3
a3 2 3
a4 2 5
Provenance
P(Qb; I ; a1) = fr1; r2g
Reexamining some Holy Grails of Provenance
25. Questions Overview Properties Semantics Insensitivity
Lineage-based
UofT:DB Group
Lineage-based [Cui et al., 2000]
Operator-level declarative semantics similar to Why. Provenance is
modeled as a list of subsets of the relations accessed by the query
(leafs of the algebra tree of Q)
Representation: f R
1 ; : : : ; R
n j R
i Ri (Q)g
Lineage: List of subsets of the algebra-tree nodes
Reexamining some Holy Grails of Provenance
26. Questions Overview Properties Semantics Insensitivity
Lineage-based
UofT:DB Group
Lineage-based [Glavic et al., 2009]
Provenance is modeled as a set of witness lists. A witness list is a
list of tuples - one from each relation accessed by the query.
Representation: P(f t1; : : : ; tn j ti 2 Ri (Q) _ ti =?g)
PI-CS: Lineage with dierent representation and broader
query language coverage
C-CS: Similar to Where but with tuple granularity
Reexamining some Holy Grails of Provenance
27. Questions Overview Properties Semantics Insensitivity
Lineage-based
UofT:DB Group
Example
Qc = A(R) [ B(R)
Qd = A(R ./B=C S)
R
A B
r1 1 2
r2 1 3
r3 2 3
r4 2 5
S
C
s1 2
s2 3
Qc
A
c1 1
c2 2
c3 3
c4 5
Qd Qe
A
d1 1
d2 2
Provenance
PI CS(Qc ; c2) = f?; r1 ; r3;?; r4;?g
PI CS(Qd ; d1) = f r1; s1 ; r2; s2 g
C CS(Qd ; d1) = f r1;?; r2;?g
Lineage(Qd ; d1) = fr1; r2g; fs1; s2g
Reexamining some Holy Grails of Provenance
28. Questions Overview Properties Semantics Insensitivity
Why
UofT:DB Group
Why [Buneman et al., 2003]
Why-provenance models provenance as a set of witnesses. A
witness w for a tuple t is a subset of the instance I where
t 2 Q(w).
Representation: P(P(Tuple(I )))
Wit: Set of all witnesses
Why: Query-syntax based proof-witnesses
IWhy: Minimal elements from Wit resp. Why
Reexamining some Holy Grails of Provenance
29. Questions Overview Properties Semantics Insensitivity
Why
UofT:DB Group
Example
Qa = R
R
A B
r1 1 2
r2 1 3
r3 2 3
r4 2 5
Qa
A B
a1 1 2
a2 1 3
a3 2 3
a4 2 5
Provenance
Wit(Qa; a1) = fJ j J R ^ r1 2 Jg
Why(Qa; a1) = ffr1gg
IWhy(Qa; a1) = ffr1gg
Reexamining some Holy Grails of Provenance
30. Questions Overview Properties Semantics Insensitivity
How
UofT:DB Group
Provenance Semirings [Green et al., 2007]
Tuples of relations annotated with elements from a semiring.
Annotation propagation de
31. ned for positive relational algebra as
operations of the semiring (set dierence and aggregration later).
Representation: N(Tuple(I ))
How: Most general form of annotations: polynomials over
variable representing the instance tuples. Addition indicates
alternative use of tuples; multiplication conjunctive use.
Reexamining some Holy Grails of Provenance
32. Questions Overview Properties Semantics Insensitivity
How
UofT:DB Group
Example
Qb = A;B(R ./A=C A!C;B!D(R))
R
A B
r1 1 2
r2 1 3
r3 2 3
r4 2 5
Qb
A B
a1 1 2
a2 1 3
a3 2 3
a4 2 5
Provenance
How(Qb; a1) = r 2
1 + r1 r2
Reexamining some Holy Grails of Provenance
33. Questions Overview Properties Semantics Insensitivity
Causality
UofT:DB Group
Causality [Meliou et al., 2010]
Provenance is modeled as a set of causes. A cause c 2 I for a
tuple t is de
34. ned as follows:
1 t62 Q(I fcg)
2 there exists a set C I called contingency so that
t 2 Q(I C) and t62 Q(I C fcg)
Representation: P(Tuple(I ))
Causality: Set of all causes
Reexamining some Holy Grails of Provenance
35. Questions Overview Properties Semantics Insensitivity
Causality
UofT:DB Group
Example
Qa = R
R
A B
r1 1 2
r2 1 3
r3 2 3
r4 2 5
Qa
A B
a1 1 2
a2 1 3
a3 2 3
a4 2 5
Provenance
Causality(Qa; a1) = fr1g
Reexamining some Holy Grails of Provenance
36. Questions Overview Properties Semantics Insensitivity
Insensitivity to Query Rewrite
UofT:DB Group
Insensitivity to Query Rewrite
Equivalent queries have the same provenance
Q Q0 ) P(Q; I ; t) = P(Q0; I ; t)
Set resp. Bag semantics
Overview
Insensitive (Set, Bag) Wit, IWhy, IWhere, Causality
Insensitive (Bag) How
Sensitive: Lineage, PI-CS, C-CS, Where
Reexamining some Holy Grails of Provenance
38. ned over black-box behaviour of query ) trivially
insensitive
Why
Sensitive, traditionally attributed to being based on query
syntax
Why may contain tuples that do not contribute to t
) Equivalent queries that apply redundant computations may
contain larger provenance
Caveat: But why does this argument not apply to Wit?
Positive queries: super-set of a witness is also a witness )
tuples used by redundant computations are in Wit
IWhy
Reexamining some Holy Grails of Provenance
Minimal elements of Why
39. Questions Overview Properties Semantics Insensitivity
Why
UofT:DB Group
Example
Qa = R
Qb = A;B(R ./A=C A!C;B!D(R))
R
A B
r1 1 2
r2 1 3
r3 2 3
r4 2 5
Qa Qb
A B
a1 1 2
a2 1 3
a3 2 3
a4 2 5
Provenance
Wit(Qa; a1) = Wit(Qb; a1) = fJ j J R ^ r1 2 Jg
Why(Qa; a1) = ffr1gg
Why(Qb; a1) = ffr1g; fr1; r2gg
IWhy(Qa; a1) = IWhy(Qb; a1) = ffr1gg
Reexamining some Holy Grails of Provenance
40. Questions Overview Properties Semantics Insensitivity
Lineage-based
UofT:DB Group
Provenance representation based on query syntax
Trivial examples for sensitivity based on reordering of the
arguments of commutative operators
Reexamining some Holy Grails of Provenance
41. Questions Overview Properties Semantics Insensitivity
Lineage-based
UofT:DB Group
Example
Qd = A(R ./B=C S)
Qe = A(S ./B=C R)
R
A B
r1 1 2
r2 1 3
r3 2 3
r4 2 5
S
C
s1 2
s2 3
Qd Qe
A
d1 1
d2 2
Provenance
Lineage(Qd ; d1) = fr1; r2g; fs1; s2g
Lineage(Qe ; d1) = fs1; s2g; fr1; r2g
PI CS(Qd ; d1) = f r1; s1 ; r2; s2 g
PI CS(Qe ; d1) = f s1; r1 ; s2; r2 g
Reexamining some Holy Grails of Provenance
42. Questions Overview Properties Semantics Insensitivity
How
UofT:DB Group
Sensitive (Set):
Insensitive (Bag): Operator semantics de
43. ned to take bag
semantics into account
Reexamining some Holy Grails of Provenance
44. Questions Overview Properties Semantics Insensitivity
How
UofT:DB Group
Example
Qa = R
Qb = A;B(R ./A=C A!C;B!D(R))
R
A B
r1 1 2
r2 1 3
r3 2 3
r4 2 5
Qa Qb
A B
a1 1 2
a2 1 3
a3 2 3
a4 2 5
Provenance
How(Qa; a1) = r1
How(Qb; a1) = r 2
1 + r1 r2
Reexamining some Holy Grails of Provenance