Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Introduction Approach Transformation Provenance Implementation Results Conclusions
TRAMP: Understanding the Behaviour of S...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Outline
1 Introduction
2 Approach
3 Tra...
Introduction Approach Transformation Provenance Implementation Results Conclusions
What is TRAMP?
TRAMP: Transformation Ma...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Data Exchange
Problem Statement
Given a...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Data Exchange Process: 1) Find Correspo...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Data Exchange Process: 1) Find Correspo...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Data Exchange Process: 2) Generate Sche...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Data Exchange Process: 2) Generate Sche...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Data Exchange Process: 2) Generate Sche...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Data Exchange Process: 3) Generate Impl...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Data Exchange Process: 4) Execute
Sourc...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Understanding and Debugging Schema Mapp...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Understanding and Debugging Schema Mapp...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Information of Interest
1 Where is data...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Information of Interest
1 Where is data...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Information of Interest
1 Where is data...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Information of Interest
1 Where is data...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Where is Data Derived from?
Example
Per...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Data Provenance
Description
Which input...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Which Mapping Created which Data?
Examp...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Which Mapping Created which Data?
Examp...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Mapping Provenance
Description
Which ma...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Which Transformation Part Created which...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Which Transformation Part Created which...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Transformation Provenance
Description
W...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Outline
1 Introduction
2 Approach
3 Tra...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Overview
Glue everything together
Store...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Related Work
Understanding Schema Mappi...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Related Work
Understanding Schema Mappi...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Contributions
Provenance
Data Provenanc...
Introduction Approach Transformation Provenance Implementation Results Conclusions
System > Sum(components)
Advanced Examp...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Outline
1 Introduction
2 Approach
3 Tra...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Modelling Transformation Provenance
Rep...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Modelling Transformation Provenance con...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Modelling Transformation Provenance con...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Modelling Transformation Provenance con...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Transformation Provenance Example
Emplo...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Transformation Provenance Example
Emplo...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Relational Representation
Data + Proven...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Outline
1 Introduction
2 Approach
3 Tra...
Introduction Approach Transformation Provenance Implementation Results Conclusions
System Overview
Based On Perm
Modified P...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Internal Bitset Representation
Annotate...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Transformation Provenance Computation
Q...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Rewrite Algorithm
1 Analyze query to
En...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Rewrite Algorithm
1 Analyze query to
En...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Rewrite Algorithm
1 Analyze query to
En...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Algorithm Step 1
SELECT
Name ,
City AS ...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Algorithm Step 1
SELECT
Name ,
City AS ...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Algorithm Step 2
SELECT Name , City AS ...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Algorithm Step 2
SELECT Name , City AS ...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Algorithm Step 2
SELECT Name , City AS ...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Rewrite Result
Name LivesAt Gender tran...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Algorithm Step 3
SELECT Name , City AS ...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Rewrite Result
Name LivesAt Gender tran...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Rewrite Result
Name LivesAt Gender tran...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Outline
1 Introduction
2 Approach
3 Tra...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Experimental Results
Based on Amalgam b...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Outline
1 Introduction
2 Approach
3 Tra...
Introduction Approach Transformation Provenance Implementation Results Conclusions
Conclusion
TRAMP = holistic approach fo...
Questions
Questions
Perm/TRAMP Open Source Release
On Source Forge
http://permdbms.sourceforge.net/
TRAMP: Understanding t...
Questions
Storing Mapping Scenarios
Storing Mapping Elements
<Correspondence >
<from >
<Table name="Employee">
<Column >na...
Questions
Storing Mapping Scenarios
Mapping-Transformation Interrelationships
Annotate parts of implementing transformatio...
Upcoming SlideShare
Loading in …5
×

2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

356 views

Published on

Though partially automated, developing schema mappings remains a complex and potentially error-prone task. In this paper, we present TRAMP (TRAnsformation Mapping Provenance), an extensive suite of tools supporting the debugging and tracing of schema mappings and transformation queries. TRAMP combines and extends data provenance with two novel notions, transformation provenance and mapping provenance, to explain the relationship between transformed data and those transformations and mappings that produced that data. In addition we provide query support for transformations, data, and all forms of provenance. We formally define transformation and mapping provenance, present an efficient implementation of both forms of provenance, and evaluate the resulting system through extensive experiments.

Published in: Science, Technology
  • Be the first to comment

  • Be the first to like this

2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

  1. 1. Introduction Approach Transformation Provenance Implementation Results Conclusions TRAMP: Understanding the Behaviour of Schema Mappings through Provenance Boris Glavic1 Gustavo Alonso2 Ren´ee J. Miller1 Laura M. Haas3 1University of Toronto 2ETH Zurich 3IBM Almaden Research Center VLDB 2010, September 16, 2010
  2. 2. Introduction Approach Transformation Provenance Implementation Results Conclusions Outline 1 Introduction 2 Approach 3 Transformation Provenance 4 Implementation 5 Results 6 Conclusions TRAMP
  3. 3. Introduction Approach Transformation Provenance Implementation Results Conclusions What is TRAMP? TRAMP: Transformation Mapping Provenance Novel “holistic” approach to help users to understand schema mappings Data Exchange Data Integration Provides query language access for Mapping scenarios Provenance Data Slide 1 of 27 Boris Glavic TRAMP
  4. 4. Introduction Approach Transformation Provenance Implementation Results Conclusions Data Exchange Problem Statement Given a source and a target schema How to map data from the source to the target? General Approach 1 Find correspondences between schema elements 2 Generate schema mappings from correspondences and schema constraints 3 Generate implementing transformations 4 Execute transformations Slide 2 of 27 Boris Glavic TRAMP
  5. 5. Introduction Approach Transformation Provenance Implementation Results Conclusions Data Exchange Process: 1) Find Correspondences Correspondence Attribute X represents the same information as attribute Y Generated by automatic matcher or user Example Employee Name Address Address Id City Street Person Name LivesAt Gender Source Target Slide 3 of 27 Boris Glavic TRAMP
  6. 6. Introduction Approach Transformation Provenance Implementation Results Conclusions Data Exchange Process: 1) Find Correspondences Correspondence Attribute X represents the same information as attribute Y Generated by automatic matcher or user Example Employee Name Address Address Id City Street Person Name LivesAt Gender Source Target Slide 3 of 27 Boris Glavic TRAMP
  7. 7. Introduction Approach Transformation Provenance Implementation Results Conclusions Data Exchange Process: 2) Generate Schema Mappings Schema Mapping Declarative Constraints that model relationships between schemas (s-t tgd or source-to-target tuple-generating dependencies) Generated from correspondences and schema constraints Example For all employees with associated addresses exists a person with the Name of the employee and the City of the address stored in the LivesAt attribute. Slide 3 of 27 Boris Glavic TRAMP
  8. 8. Introduction Approach Transformation Provenance Implementation Results Conclusions Data Exchange Process: 2) Generate Schema Mappings Schema Mapping Declarative Constraints that model relationships between schemas (s-t tgd or source-to-target tuple-generating dependencies) Generated from correspondences and schema constraints Example M1 : ∀a, b, c, d : Employee(a, b) ∧ Address(b, c, d) ⇒ ∃f : person(a, c, f ) Slide 3 of 27 Boris Glavic TRAMP
  9. 9. Introduction Approach Transformation Provenance Implementation Results Conclusions Data Exchange Process: 2) Generate Schema Mappings Schema Mapping Declarative Constraints that model relationships between schemas (s-t tgd or source-to-target tuple-generating dependencies) Generated from correspondences and schema constraints Example M1 : ∀a, b, c, d : Employee(a, b) ∧ Address(b, c, d) ⇒ ∃f : person(a, c, f ) M2 : ∀a, b : Employee(a, b) ⇒ ∃c, d : person(a, c, d) Slide 3 of 27 Boris Glavic TRAMP
  10. 10. Introduction Approach Transformation Provenance Implementation Results Conclusions Data Exchange Process: 3) Generate Implementing Transformations Implementing Transformation Schema mappings do not specify full target instance Need executable transformation Generated from schema mappings (XQuery, SQL, . . . ) Many-to-Many (Mappings-Transformations) Example SELECT Name , city AS LivesAt , NULL AS Gender FROM Employee e JOIN Address a ON (e.Address = a.Id) UNION SELECT Name , NULL AS LivesAt , NULL AS Gender FROM Employee e; Slide 3 of 27 Boris Glavic TRAMP
  11. 11. Introduction Approach Transformation Provenance Implementation Results Conclusions Data Exchange Process: 4) Execute Source Employee Name Address Gerd 2 Nandy NULL Address Id City Street 1 Prag Krutz 2 Aachen Pond → SELECT Name, city AS LivesAt, NULL AS Gender FROM Employee e JOIN Address a ON (e.Address = a.Id) UNION SELECT Name, NULL AS LivesAt, NULL AS Gender FROM Employee e; → Target Person Name LivesAt Gender Gerd Aachen NULL Gerd NULL NULL Nandy NULL NULL Slide 3 of 27 Boris Glavic TRAMP
  12. 12. Introduction Approach Transformation Provenance Implementation Results Conclusions Understanding and Debugging Schema Mappings Complex error-prone process Many sources of error: Faulty source data Incorrect correspondences Incorrect schema mappings Incorrect transformations Hard to trace error source Slide 4 of 27 Boris Glavic TRAMP
  13. 13. Introduction Approach Transformation Provenance Implementation Results Conclusions Understanding and Debugging Schema Mappings Complex error-prone process Many sources of error: Faulty source data Incorrect correspondences Incorrect schema mappings Incorrect transformations Hard to trace error source How to help the user? Provide information that aids in debugging Allow for combination and filtering ⇒ Query language Slide 4 of 27 Boris Glavic TRAMP
  14. 14. Introduction Approach Transformation Provenance Implementation Results Conclusions Information of Interest 1 Where is data derived from? Slide 5 of 27 Boris Glavic TRAMP
  15. 15. Introduction Approach Transformation Provenance Implementation Results Conclusions Information of Interest 1 Where is data derived from? 2 Which mapping created which data? Slide 5 of 27 Boris Glavic TRAMP
  16. 16. Introduction Approach Transformation Provenance Implementation Results Conclusions Information of Interest 1 Where is data derived from? 2 Which mapping created which data? 3 Which part of a transformation created which data? Slide 5 of 27 Boris Glavic TRAMP
  17. 17. Introduction Approach Transformation Provenance Implementation Results Conclusions Information of Interest 1 Where is data derived from? 2 Which mapping created which data? 3 Which part of a transformation created which data? 4 Mapping Scenario (Interrelationships) Slide 5 of 27 Boris Glavic TRAMP
  18. 18. Introduction Approach Transformation Provenance Implementation Results Conclusions Where is Data Derived from? Example Person Name LivesAt Gender Gerd Aachen NULL Gerd NULL NULL Nandy NULL NULL ↑ SELECT Name , City AS LivesAt , NULL AS Gender FROM Employee e JOIN Address a ON (e.Address = a.Id) UNION SELECT Name , NULL AS LivesAt , NULL AS Gender FROM Employee e; ↑ Employee Name Address Gerd 2 Nandy NULL ↑ Address Id City Street 1 Prag Krutz 2 Aachen Pond ↑ Employee Name Address Gerd 2 Nandy NULL Slide 6 of 27 Boris Glavic TRAMP
  19. 19. Introduction Approach Transformation Provenance Implementation Results Conclusions Data Provenance Description Which input tuples contributed to which output tuples? Example Person Name LivesAt Gender Gerd Aachen NULL Gerd NULL NULL Nandy NULL NULL ↑ Employee Name Address Gerd 2 Nandy NULL ↑ Address Id City Street 1 Prag Krutz 2 Aachen Pond ↑ Employee Name Address Gerd 2 Nandy NULL Slide 6 of 27 Boris Glavic TRAMP
  20. 20. Introduction Approach Transformation Provenance Implementation Results Conclusions Which Mapping Created which Data? Example Person Name LivesAt Gender Gerd Aachen NULL Gerd NULL NULL Nandy NULL NULL ↑ Mapping M1 Slide 7 of 27 Boris Glavic TRAMP
  21. 21. Introduction Approach Transformation Provenance Implementation Results Conclusions Which Mapping Created which Data? Example Person Name LivesAt Gender Gerd Aachen NULL Gerd NULL NULL Nandy NULL NULL ↑ Mapping M2 Slide 7 of 27 Boris Glavic TRAMP
  22. 22. Introduction Approach Transformation Provenance Implementation Results Conclusions Mapping Provenance Description Which mappings generated a target tuple? Example Person Name LivesAt Gender Gerd Aachen NULL Gerd NULL NULL Nandy NULL NULL ↑ Mapping M2 Slide 7 of 27 Boris Glavic TRAMP
  23. 23. Introduction Approach Transformation Provenance Implementation Results Conclusions Which Transformation Part Created which Data? Employee Address 1 1 0 1 Employee 1 01 Example Person Name LivesAt Gender Gerd Aachen NULL Gerd NULL NULL Nandy NULL NULL ↑ SELECT Name , city AS LivesAt , NULL AS Gender FROM Employee e JOIN Address a ON (e.Address = a.Id) UNION SELECT Name , NULL AS LivesAt , NULL AS Gender FROM Employee e; Slide 8 of 27 Boris Glavic TRAMP
  24. 24. Introduction Approach Transformation Provenance Implementation Results Conclusions Which Transformation Part Created which Data? Employee Address 0 0 1 0 Employee 1 10 Example Person Name LivesAt Gender Gerd Aachen NULL Gerd NULL NULL Nandy NULL NULL ↑ SELECT Name , city AS LivesAt , NULL AS Gender FROM Employee e JOIN Address a ON (e.Address = a.Id) UNION SELECT Name , NULL AS LivesAt , NULL AS Gender FROM Employee e; Slide 8 of 27 Boris Glavic TRAMP
  25. 25. Introduction Approach Transformation Provenance Implementation Results Conclusions Transformation Provenance Description Which parts of a transformation contributed to an output tuple? Example Person Name LivesAt Gender Gerd Aachen NULL Gerd NULL NULL Nandy NULL NULL ↑ SELECT Name , city AS LivesAt , NULL AS Gender FROM Employee e JOIN Address a ON (e.Address = a.Id) UNION SELECT Name , NULL AS LivesAt , NULL AS Gender FROM Employee e; Slide 8 of 27 Boris Glavic TRAMP
  26. 26. Introduction Approach Transformation Provenance Implementation Results Conclusions Outline 1 Introduction 2 Approach 3 Transformation Provenance 4 Implementation 5 Results 6 Conclusions Slide 9 of 27 Boris Glavic TRAMP
  27. 27. Introduction Approach Transformation Provenance Implementation Results Conclusions Overview Glue everything together Store all mapping scenario elements and their inter-relationships in a database Relational: instance data, schemas, transformations (views) XML: Mappings, Correspondences Provenance SQL extension On-demand provenance computation Relational/XML representation PROVENANCE: Data provenance (relational) TRANSXML: Transformation provenance (XML) One fits all query language: SQL SQL: data and data-provenance XSLT: queries and transformation provenance Slide 9 of 27 Boris Glavic TRAMP
  28. 28. Introduction Approach Transformation Provenance Implementation Results Conclusions Related Work Understanding Schema Mappings Approaches using provenance SPIDER [Chiticariu, Tan, VLDB’06] MXQL [Velegrakis, Miller, Mylopoulos, ICDE’05] Orchestra [Karvounarakis, Ives, Tannen, SIGMOD’10] Example based approaches Clio Data Viewer [Yan, Miller, Haas, Fagin SIGMOD ’01] MUSE [Alexe, Chiticariu, Miller, Tan, ICDE ’08] Slide 10 of 27 Boris Glavic TRAMP
  29. 29. Introduction Approach Transformation Provenance Implementation Results Conclusions Related Work Understanding Schema Mappings Approaches using provenance SPIDER [Chiticariu, Tan, VLDB’06] MXQL [Velegrakis, Miller, Mylopoulos, ICDE’05] Orchestra [Karvounarakis, Ives, Tannen, SIGMOD’10] SupportsSystem SPIDER MXQL Orchestra Data Provenance X * X Mapping Provenance X X X Transformation Provenance - - - Querying Provenance - X X Querying Mappings - - - Slide 10 of 27 Boris Glavic TRAMP
  30. 30. Introduction Approach Transformation Provenance Implementation Results Conclusions Contributions Provenance Data Provenance: Perm (ASPJ-Set queries with nested subqueries) Mapping Provenance: In combination with transformations (ASPJ-Set) Transformation Provenance: NEW (ASPJ-Set) Querying Single query language: Data + Provenance + Mapping Scenarios Slide 11 of 27 Boris Glavic TRAMP
  31. 31. Introduction Approach Transformation Provenance Implementation Results Conclusions System > Sum(components) Advanced Examples Which tuples where derived through mapping M1 or M2, without accessing source relation R, and are derived from tuple x in Relation Y? Are there tuples in the target relation R that have been derived from the same input tuple? Error Classification Classification of error types in the paper Foreach error type: example how to use TRAMP to debug Slide 12 of 27 Boris Glavic TRAMP
  32. 32. Introduction Approach Transformation Provenance Implementation Results Conclusions Outline 1 Introduction 2 Approach 3 Transformation Provenance 4 Implementation 5 Results 6 Conclusions Slide 13 of 27 Boris Glavic TRAMP
  33. 33. Introduction Approach Transformation Provenance Implementation Results Conclusions Modelling Transformation Provenance Representation Transformation provenance of tuple t from query q: (Set of) annotated algebra trees for q Each node carries boolean annotation 1-Annotation: operator contributed 0-Annotation: operator did not contribute Example Employee Address 1 1 0 1 Employee 1 01 Employee Address 0 0 1 0 Employee 1 10 Slide 13 of 27 Boris Glavic TRAMP
  34. 34. Introduction Approach Transformation Provenance Implementation Results Conclusions Modelling Transformation Provenance cont. Definition For tuple t from query q Node for op carries 1-annotation iff: Evaluating the subtree under op over the data provenance of t does not return the empty set Slide 14 of 27 Boris Glavic TRAMP
  35. 35. Introduction Approach Transformation Provenance Implementation Results Conclusions Modelling Transformation Provenance cont. Definition For tuple t from query q Node for op carries 1-annotation iff: Evaluating the subtree under op over the data provenance of t does not return the empty set Intuition Data provenance is necessary information to produce t None of this information “reaches” the output of the subtree under op ⇒ This part of the query did not contribute Slide 14 of 27 Boris Glavic TRAMP
  36. 36. Introduction Approach Transformation Provenance Implementation Results Conclusions Modelling Transformation Provenance cont. Definition For tuple t from query q Node for op carries 1-annotation iff: Evaluating the subtree under op over the data provenance of t does not return the empty set Intuition Data provenance is necessary information to produce t None of this information “reaches” the output of the subtree under op ⇒ This part of the query did not contribute Relation to data provenance Abstraction Different Representation Data provenance of all query steps Slide 14 of 27 Boris Glavic TRAMP
  37. 37. Introduction Approach Transformation Provenance Implementation Results Conclusions Transformation Provenance Example Employee Address 1 1 11 Example Person Name LivesAt Gender Gerd Aachen NULL Nandy NULL NULL ↑ SELECT Name , city AS LivesAt , NULL AS Gender FROM Employee e LEFT JOIN Address a ON (e.Address = a.Id); ↑ Employee Name Address Gerd 2 Nandy NULL ↑ Address Id City Street 1 Prag Krutz 2 Aachen Pond Slide 15 of 27 Boris Glavic TRAMP
  38. 38. Introduction Approach Transformation Provenance Implementation Results Conclusions Transformation Provenance Example Employee Address 1 1 01 Example Person Name LivesAt Gender Gerd Aachen NULL Nandy NULL NULL ↑ SELECT Name , city AS LivesAt , NULL AS Gender FROM Employee e LEFT JOIN Address a ON (e.Address = a.Id); ↑ Employee Name Address Gerd 2 Nandy NULL ↑ Address Id City Street 1 Prag Krutz 2 Aachen Pond Slide 15 of 27 Boris Glavic TRAMP
  39. 39. Introduction Approach Transformation Provenance Implementation Results Conclusions Relational Representation Data + Provenance Relation Query result schema + transformation provenance attribute Duplicate result tuple t Add one annotated tree to each duplicate XML representation of annotated tree Example <Query > <Select > <Attr name="Name"><Var>e.Name </Var></Attr > ... <From ><LeftJoin > ... <NOT><Relation alias="a">Address </Relation ></NOT> ... Slide 16 of 27 Boris Glavic TRAMP
  40. 40. Introduction Approach Transformation Provenance Implementation Results Conclusions Outline 1 Introduction 2 Approach 3 Transformation Provenance 4 Implementation 5 Results 6 Conclusions Slide 17 of 27 Boris Glavic TRAMP
  41. 41. Introduction Approach Transformation Provenance Implementation Results Conclusions System Overview Based On Perm Modified PostgreSQL server Provenance language constructs implemented as query rewrites “Use SQL to compute the provenance of SQL” Optimizer Query Plan Execution Engine Executor Query Results a prov_a prov_b 123 'hello' 2.45 445 'test' 1.333 TRAMP Module Rewritten Query Tree TRAMP Module Postgres Parser Query Tree Postgres Analyser Parser & Analyzer SELECT PROVENANCE * FROM ... JDBC User Slide 17 of 27 Boris Glavic TRAMP
  42. 42. Introduction Approach Transformation Provenance Implementation Results Conclusions Internal Bitset Representation Annotated Algebra Trees only vary in annotation ⇒ Factor out the tree ⇒ Annotations = set of nodes with 1-annotation ⇒ Use a bit-vector Compact representation Set union = bitwise-or Slide 18 of 27 Boris Glavic TRAMP
  43. 43. Introduction Approach Transformation Provenance Implementation Results Conclusions Transformation Provenance Computation Query Rewrite Rewrite query q into qT qT propagates bitsets throughout the query (partial annotated trees) Result construction function fXML: builds XML No data provenance needed! Algebraic rewrite rules (correctness proven) Optimizations Depending on operators some annotations are static static = are independent of t Static sets are generated beforehand Slide 19 of 27 Boris Glavic TRAMP
  44. 44. Introduction Approach Transformation Provenance Implementation Results Conclusions Rewrite Algorithm 1 Analyze query to Enumerate the operators and attach singleton bitsets Determine static bit-sets Slide 20 of 27 Boris Glavic TRAMP
  45. 45. Introduction Approach Transformation Provenance Implementation Results Conclusions Rewrite Algorithm 1 Analyze query to Enumerate the operators and attach singleton bitsets Determine static bit-sets 2 Apply rewrite rules Recursively to each operator in the query Slide 20 of 27 Boris Glavic TRAMP
  46. 46. Introduction Approach Transformation Provenance Implementation Results Conclusions Rewrite Algorithm 1 Analyze query to Enumerate the operators and attach singleton bitsets Determine static bit-sets 2 Apply rewrite rules Recursively to each operator in the query 3 Add application of fXML Slide 20 of 27 Boris Glavic TRAMP
  47. 47. Introduction Approach Transformation Provenance Implementation Results Conclusions Algorithm Step 1 SELECT Name , City AS LivesAt , NULL AS Gender FROM Employee e LEFT JOIN Address a ON (e.Address = a.Id); Employee Address 1000 0100 0010 0001 Slide 21 of 27 Boris Glavic TRAMP
  48. 48. Introduction Approach Transformation Provenance Implementation Results Conclusions Algorithm Step 1 SELECT Name , City AS LivesAt , NULL AS Gender FROM Employee e LEFT JOIN Address a ON (e.Address = a.Id); Employee Address 1000 0100 0010 0001 For this query Projection + Left Join + Employee is fixed ⇒ use one bit-set: 1110 Slide 21 of 27 Boris Glavic TRAMP
  49. 49. Introduction Approach Transformation Provenance Implementation Results Conclusions Algorithm Step 2 SELECT Name , City AS LivesAt , NULL AS Gender bitor( 1110, CASE WHEN a.Id IS NULL THEN 0000 ELSE 0001 END) AS trans prov FROM Employee e LEFT JOIN Address a ON (e.Address = a.Id); Slide 22 of 27 Boris Glavic TRAMP
  50. 50. Introduction Approach Transformation Provenance Implementation Results Conclusions Algorithm Step 2 SELECT Name , City AS LivesAt , NULL AS Gender bitor( 1110, CASE WHEN a.Id IS NULL THEN 0000 ELSE 0001 END) AS trans prov FROM Employee e LEFT JOIN Address a ON (e.Address = a.Id); Slide 22 of 27 Boris Glavic TRAMP
  51. 51. Introduction Approach Transformation Provenance Implementation Results Conclusions Algorithm Step 2 SELECT Name , City AS LivesAt , NULL AS Gender bitor( 1110, CASE WHEN a.Id IS NULL THEN 0000 ELSE 0001 END) AS trans prov FROM Employee e LEFT JOIN Address a ON (e.Address = a.Id); Slide 22 of 27 Boris Glavic TRAMP
  52. 52. Introduction Approach Transformation Provenance Implementation Results Conclusions Rewrite Result Name LivesAt Gender trans prov Gerd Aachen NULL 1111 Nandy NULL NULL 1110 Slide 23 of 27 Boris Glavic TRAMP
  53. 53. Introduction Approach Transformation Provenance Implementation Results Conclusions Algorithm Step 3 SELECT Name , City AS LivesAt , NULL AS Gender fXML ( bitor( 1110, CASE WHEN a.Id IS NULL THEN 0000 ELSE 0001 END)) AS trans_prov FROM Employee e LEFT JOIN Address a ON (e.Address = a.Id); Slide 24 of 27 Boris Glavic TRAMP
  54. 54. Introduction Approach Transformation Provenance Implementation Results Conclusions Rewrite Result Name LivesAt Gender trans prov Gerd Aachen NULL <Query><Select>...<From>... Nandy NULL NULL <Query><Select>...<From>... Gerd <Query > <Select > <Attr name="Name"><Var>e.Name </Var></Attr > ... <From ><LeftJoin > ... <Relation alias="a">Address </Relation > ... Slide 25 of 27 Boris Glavic TRAMP
  55. 55. Introduction Approach Transformation Provenance Implementation Results Conclusions Rewrite Result Name LivesAt Gender trans prov Gerd Aachen NULL <Query><Select>...<From>... Nandy NULL NULL <Query><Select>...<From>... Nandy <Query > <Select > <Attr name="Name"><Var>e.Name </Var></Attr > ... <From ><LeftJoin > ... <NOT><Relation alias="a">Address </Relation ></NOT> ... Slide 25 of 27 Boris Glavic TRAMP
  56. 56. Introduction Approach Transformation Provenance Implementation Results Conclusions Outline 1 Introduction 2 Approach 3 Transformation Provenance 4 Implementation 5 Results 6 Conclusions Slide 26 of 27 Boris Glavic TRAMP
  57. 57. Introduction Approach Transformation Provenance Implementation Results Conclusions Experimental Results Based on Amalgam benchmark (publication data) Execution times for implementing transformations with and without transformation provenance 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 1 5 10 50 100 500 1000 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 RelativeOverhead Instance Size (#publs in thousands) W/O provenance Only Bitset Transformation Prov Slide 26 of 27 Boris Glavic TRAMP
  58. 58. Introduction Approach Transformation Provenance Implementation Results Conclusions Outline 1 Introduction 2 Approach 3 Transformation Provenance 4 Implementation 5 Results 6 Conclusions Slide 27 of 27 Boris Glavic TRAMP
  59. 59. Introduction Approach Transformation Provenance Implementation Results Conclusions Conclusion TRAMP = holistic approach for understanding mappings Perm approach for data provenance Transformation provenance Querying of Data Provenance Mapping scenario information Future Work Integrate with data exchange/integration systems Combine with example-based approaches like MUSE Provide hints on how to change mappings to generate an expected result [Tran, Chan SIGMOD ’10] Slide 27 of 27 Boris Glavic TRAMP
  60. 60. Questions Questions Perm/TRAMP Open Source Release On Source Forge http://permdbms.sourceforge.net/ TRAMP: Understanding the Behaviour of Schema Mappings through Provenance
  61. 61. Questions Storing Mapping Scenarios Storing Mapping Elements <Correspondence > <from > <Table name="Employee"> <Column >name </Column > </Table > </from > <to> <Table name="Person"> <Column >name </Column > </Table > </to> </ Correspondence > TRAMP: Understanding the Behaviour of Schema Mappings through Provenance
  62. 62. Questions Storing Mapping Scenarios Mapping-Transformation Interrelationships Annotate parts of implementing transformations with corresponding mappings. SELECT ANNOT (’M1’) Name , city AS LivesAt , NULL AS Gender FROM Employee e JOIN Address a ON (e.Address = a.Id) UNION SELECT ANNOT (’M2’) Name , NULL AS LivesAt , NULL AS Gender FROM Employee e; TRAMP: Understanding the Behaviour of Schema Mappings through Provenance

×