Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance

324 views

Published on

We reconsider some of the explicit and implicit properties that underlie well-established definitions of data provenance semantics. Previous work on comparing provenance semantics has mostly focused on expressive power (does the provenance generated by a certain semantics subsume the provenance generated by other semantics) and on understanding whether a semantics is insensitive to query rewrite (i.e., do equivalent queries have the same provenance). In contrast, we try to investigate why certain semantics possess specific properties (like insensitivity) and whether these properties are always desirable. We present a new property stability with respect to query language extension that, to the best of our knowledge, has not been isolated and studied on its own.

Published in: Science
  • Be the first to comment

  • Be the first to like this

TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance

  1. 1. Properties Semantics Insensitivity Stability Conclusions Reexamining some Holy Grails of Provenance Boris Glavic Renee J. Miller University of Toronto UofT:DB Group TaPP 2011, November 5, 2014
  2. 2. Properties Semantics Insensitivity Stability Conclusions Insensitivity UofT:DB Group Insensitivity to Query Rewrite Equivalent queries have the same provenance Q Q0 ) P(Q; I ; t) = P(Q0; I ; t) Holy Grails
  3. 3. Properties Semantics Insensitivity Stability Conclusions Insensitivity UofT:DB Group Insensitivity to Query Rewrite Equivalent queries have the same provenance Q Q0 ) P(Q; I ; t) = P(Q0; I ; t) Caveat: Which queries are equivalent? Set vs. Bag semantics Query language / Operators Holy Grails
  4. 4. Properties Semantics Insensitivity Stability Conclusions Stability UofT:DB Group Stability with Respect to Query Language Extension Extend query language with new operators ) no change to provenance of queries that do not use new operators Holy Grails
  5. 5. Properties Semantics Insensitivity Stability Conclusions Where UofT:DB Group Where [Buneman et al., 2003] Captures which attribute values in the result of a query have been copied from which attribute values in the instance. Representation: P(Attr (I )) Where: Operator-level syntax-based annotation propagation IWhere: Insensitive variant: Union of Where for all Q0 with Q0 Q Holy Grails
  6. 6. Properties Semantics Insensitivity Stability Conclusions Where UofT:DB Group Where Sensitive, traditionally attributed to being based on query syntax Depends on the internal data- ow inside the query How values are routed through the query IWhere Insensitive by combining Where for all equivalent queries Counterintuitive eect that if (R; t;A) is in the provenance then all (R; t0; A) with t:A = t0:A are in the provenance too. Reason: Can construct equivalent query adding self-join on A Holy Grails
  7. 7. Properties Semantics Insensitivity Stability Conclusions Where UofT:DB Group Example Qa = R Qb = A;B(R ./A=C A!C;B!D(R)) R A B r1 1 2 r2 1 3 r3 2 3 r4 2 5 Qa Qb A B a1 1 2 a2 1 3 a3 2 3 a4 2 5 Provenance Where(Qa; a1; A) = f(r1;A)g Where(Qb; a1; A) = f(r1;A); (r2;A)g IWhere(Qa; a1; A) = IWhere(Qb; a1; A) = f(r1;A); (r2;A)g Holy Grails
  8. 8. Properties Semantics Insensitivity Stability Conclusions Arguments for Insensitivity UofT:DB Group 1 Traditionally observed as advantageous in database research ) Tradition not a solid argument 2 External implementation of sensitive semantics. Computing provenance for a query dierent from the one that will be executed by DBMS ) No way to solve this, but provenance based on user query seems to be reasonable 3 Implementation of sensitive semantics in DB-engine limits optimizer search space ) Insensitive semantics may be harder to compute ) Lack of practical experience ) Realistic sensitivity example? Holy Grails
  9. 9. Properties Semantics Insensitivity Stability Conclusions Instability of IWhere UofT:DB Group IWhere is union of Where for all equivalent queries Holy Grails
  10. 10. Properties Semantics Insensitivity Stability Conclusions Instability of IWhere UofT:DB Group IWhere is union of Where for all equivalent queries e.g., SPJ and USPJ equivalences are dierent Holy Grails
  11. 11. Properties Semantics Insensitivity Stability Conclusions Instability of IWhere UofT:DB Group IWhere is union of Where for all equivalent queries e.g., SPJ and USPJ equivalences are dierent e.g., union Q with a join of Q with some other relation Holy Grails
  12. 12. Properties Semantics Insensitivity Stability Conclusions Instability of IWhere UofT:DB Group IWhere is union of Where for all equivalent queries e.g., SPJ and USPJ equivalences are dierent e.g., union Q with a join of Q with some other relation Let UWhere be IWhere for USPJ queries Holy Grails
  13. 13. Properties Semantics Insensitivity Stability Conclusions Instability of IWhere UofT:DB Group IWhere is union of Where for all equivalent queries e.g., SPJ and USPJ equivalences are dierent e.g., union Q with a join of Q with some other relation Let UWhere be IWhere for USPJ queries ) UWhere an attribute value is annotated with all annotations from attribute positions that have the same value Holy Grails
  14. 14. Properties Semantics Insensitivity Stability Conclusions Instability of IWhere UofT:DB Group Example Qa = R R A B r1 1 2 r2 1 3 r3 2 3 r4 2 5 S C s1 2 s2 3 Qa Qb A B a1 1 2 a2 1 3 a3 2 3 a4 2 5 Provenance Where(Qa; a3;A) = f(r3; A)g IWhere(Qa; a3;A) = f(r3; A); (r4; A)g UWhere(Qa; a3;A) = f(r3; A); (r4; A); (r1;B); (s1; C)g Holy Grails
  15. 15. Properties Semantics Insensitivity Stability Conclusions Conclusions UofT:DB Group Take Away Messages Be careful how to achieve a property Insensitivity less applicable to semantics that address internal data- ow Queries with the same external but possibly dierent internal behaviour have the same provenance Some Things I'd Like to See Declarative Semantics ) derive operator-level construction Semantics model processing, but have a insensitive core The never-ending quest: Deal with Negation Other data-models (order) Holy Grails
  16. 16. Questions Overview Properties Semantics Insensitivity Questions UofT:DB Group Semantics Sound Complete Responsible Insensitive (set) Insensitive (bag) Stable Why Wit - X - X X X Why - X - - ? X IWhy - X X X X X Where Where - - - - ? X IWhere - - - X X - How - X - - X X Lineage-based Lineage X X - - - X PI-CS X X - - - X C-CS X - - - - X Causality - X X X X X Reexamining some Holy Grails of Provenance
  17. 17. Questions Overview Properties Semantics Insensitivity Semantics Summary UofT:DB Group Representation Used by P(Attr (I )) Where, IWhere P(P(Tuple(I ))) Wit, Why, IWhy N[Tuple(I )] How f R 1 ; : : : ; R n j R i Ri (Q)g Lineage P(f t1; : : : ; tn j ti 2 Ri (Q) _ ti =?g) PI-CS, C- CS P(Tuple(I )) Causality Reexamining some Holy Grails of Provenance
  18. 18. Questions Overview Properties Semantics Insensitivity Sound, Complete, Responsible UofT:DB Group Sound, Complete, Responsible Sound: Provenance of t produces nothing dierent from t. t06= t ) t062 Q(P(Q; I ; t)) Reexamining some Holy Grails of Provenance
  19. 19. Questions Overview Properties Semantics Insensitivity Sound, Complete, Responsible UofT:DB Group Sound, Complete, Responsible Sound: Provenance of t produces nothing dierent from t. t06= t ) t062 Q(P(Q; I ; t)) Caveat: Semantics of evaluating query over provenance Reexamining some Holy Grails of Provenance
  20. 20. Questions Overview Properties Semantics Insensitivity Sound, Complete, Responsible UofT:DB Group Sound, Complete, Responsible Sound: Provenance of t produces nothing dierent from t. t06= t ) t062 Q(P(Q; I ; t)) Complete: Provenance of t produces at least t t 2 Q(P(Q; I ; t)) Reexamining some Holy Grails of Provenance
  21. 21. Questions Overview Properties Semantics Insensitivity Sound, Complete, Responsible UofT:DB Group Sound, Complete, Responsible Sound: Provenance of t produces nothing dierent from t. t06= t ) t062 Q(P(Q; I ; t)) Complete: Provenance of t produces at least t t 2 Q(P(Q; I ; t)) Caveat: Semantics of evaluating query over provenance Reexamining some Holy Grails of Provenance
  22. 22. Questions Overview Properties Semantics Insensitivity Sound, Complete, Responsible UofT:DB Group Sound, Complete, Responsible Sound: Provenance of t produces nothing dierent from t. t06= t ) t062 Q(P(Q; I ; t)) Complete: Provenance of t produces at least t t 2 Q(P(Q; I ; t)) Responsible: Every tuple in the provenance of t is necessary to derive t Reexamining some Holy Grails of Provenance
  23. 23. Questions Overview Properties Semantics Insensitivity Sound, Complete, Responsible UofT:DB Group Sound, Complete, Responsible Sound: Provenance of t produces nothing dierent from t. t06= t ) t062 Q(P(Q; I ; t)) Complete: Provenance of t produces at least t t 2 Q(P(Q; I ; t)) Responsible: Every tuple in the provenance of t is necessary to derive t Caveat: . . . from every alternative derivation in the provenance . . . ) factor provenance into alterative derivations Caveat: Dierent ways to model that. E.g., 8t0 2 P(Q; I ; t) : t =2 Q(P(Q; I ; t)

×