Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mazda Trio Meeting


Published on

Experience Mazda Zoom Zoom Lifestyle and Culture by Visiting and joining the Official Mazda Community at for additional insight into the Zoom Zoom Lifestyle and special offers for Mazda Community Members. If you live in Arizona, check out CardinaleWay Mazda's eCommerce website at

Published in: Automotive, Technology, Sports
  • Be the first to comment

  • Be the first to like this

Mazda Trio Meeting

  1. 1. Trio: A System for Data, Uncertainty, and Lineage Search “stanford trio” DATA UNCERTAINTY LINEAGE
  2. 2. People <ul><li>Current </li></ul><ul><ul><li>Jennifer Widom (faculty) </li></ul></ul><ul><ul><li>Omar Benjelloun (post-doc) </li></ul></ul><ul><ul><li>Parag Agrawal, Anish Das Sarma, Shubha Nabar (PhD) </li></ul></ul><ul><ul><li>Michi Mutsuzaki (MS) </li></ul></ul><ul><ul><li>Tomoe Sugihara (visitor) </li></ul></ul><ul><li>Incoming </li></ul><ul><ul><li>Martin Theobald (post-doc) </li></ul></ul><ul><ul><li>Raghu Murthy (MS) </li></ul></ul><ul><ul><li>Ander de Keijzer (visitor) </li></ul></ul><ul><li>Alums </li></ul><ul><ul><li>Alon Halevy, Ashok Chandra (visitors) </li></ul></ul><ul><ul><li>Chris Hayworth (MS) </li></ul></ul>
  3. 3. Why Uncertainty + Lineage? <ul><li>Many applications seem to need both </li></ul><ul><li>From a technical standpoint, it turns out that </li></ul><ul><li>lineage... </li></ul><ul><ul><li>Enables simple and consistent representation of uncertain data </li></ul></ul><ul><ul><li>Correlates uncertainty in query results with uncertainty in the input data </li></ul></ul><ul><ul><li>Can make computation over uncertain data more efficient </li></ul></ul>
  4. 4. Trio Components <ul><li>Data Model </li></ul><ul><ul><li>ULDBs (Uncertainty-Lineage Databases): </li></ul></ul><ul><ul><li>Simple extension to relational model </li></ul></ul><ul><li>Query Language </li></ul><ul><ul><li>TriQL: Simple extension to SQL, well-defined semantics and intuitive behavior </li></ul></ul><ul><li>System </li></ul><ul><ul><li>Version 1: Complete system and GUI built on top of conventional DBMS </li></ul></ul>
  5. 5. Running Example: Crime-Solving <ul><li>Saw( witness , car ) // may be uncertain </li></ul><ul><li>Drives( person , car ) // may be uncertain </li></ul><ul><li>Suspects( person ) = π person (Saw ⋈ Drives) </li></ul>
  6. 6. Our Model for Uncertainty <ul><li>1. Alternatives </li></ul><ul><li>2. ‘?’ (Maybe) Annotations </li></ul><ul><li>3. Confidences </li></ul>
  7. 7. Our Model for Uncertainty <ul><li>1. Alternatives: uncertainty about value </li></ul><ul><li>2. ‘?’ (Maybe) Annotations </li></ul><ul><li>3. Confidences </li></ul>= Three possible instances (Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda) Saw (witness,car) { Honda, Toyota, Mazda } car Amy witness
  8. 8. Our Model for Uncertainty <ul><li>1. Alternatives </li></ul><ul><li>2. ‘?’ (Maybe): uncertainty about presence </li></ul><ul><li>3. Confidences </li></ul>Six possible instances ? (Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda) (Betty, Acura) Saw (witness,car)
  9. 9. Our Model for Uncertainty <ul><li>1. Alternatives </li></ul><ul><li>2. ‘?’ (Maybe) Annotations </li></ul><ul><li>3. Confidences: weighted uncertainty </li></ul>? Six possible instances, each with a probability (Amy, Honda): 0.5 ∥ (Amy,Toyota): 0.3 ∥ (Amy, Mazda): 0.2 (Betty, Acura): 0.6 Saw (witness,car)
  10. 10. Models for Uncertainty <ul><li>Our model (so far) is not especially new </li></ul><ul><li>We spent some time exploring the space of models for uncertainty [ICDE 06, journal] </li></ul><ul><li>Tension between understandability and expressiveness </li></ul><ul><ul><li>Our model is understandable </li></ul></ul><ul><ul><li>But it is not complete, or even closed under common operations </li></ul></ul>
  11. 11. Our Model is Not Closed Suspects = π person (Saw ⋈ Drives) ? ? ? Does not correctly capture possible instances in the result CANNOT (Cathy, Honda) ∥ (Cathy, Mazda) Saw (witness,car) (Billy, Honda) ∥ (Frank, Honda) (Hank, Honda) (Jimmy, Toyota) ∥ (Jimmy, Mazda) Drives (person,car) Jimmy Billy ∥ Frank Hank Suspects
  12. 12. Lineage to the Rescue <ul><li>Lineage </li></ul><ul><ul><li>Captures “where data came from” </li></ul></ul><ul><ul><li>In Trio: A function λ from alternatives to other alternatives (or external sources) </li></ul></ul>
  13. 13. Example with Lineage ? ? ? Suspects = π person (Saw ⋈ Drives) λ (31) = (11,2),(21,2) λ (32,1) = (11,1),(22,1); λ (32,2) = (11,1),(22,2) λ (33) = (11,1), 23 11 ID (Cathy, Honda) ∥ (Cathy, Mazda) Saw (witness,car) 23 22 21 ID (Billy, Honda) ∥ (Frank, Honda) (Hank, Honda) (Jimmy, Toyota) ∥ (Jimmy, Mazda) Drives (person,car) 33 32 31 ID Jimmy Billy ∥ Frank Hank Suspects Correctly captures possible instances in the result
  14. 14. Uncertainty-Lineage Databases (ULDBs) <ul><li>1. Alternatives </li></ul><ul><li>2. ‘?’ (Maybe) Annotations </li></ul><ul><li>3. Confidences </li></ul><ul><li>4. Lineage </li></ul><ul><li>ULDBs are closed and complete </li></ul><ul><li>[VLDB 06] </li></ul>
  15. 15. ULDBs: Lineage <ul><li>Conjunctive lineage sufficient for most operations </li></ul><ul><li>Duplicate-elimination: Disjunctive lineage </li></ul><ul><li>Difference: Negative lineage </li></ul><ul><li>General case after multiple operations/queries: Boolean formula </li></ul>
  16. 16. ULDBs: Interesting Questions <ul><li>Data-minimality: extraneous alternatives, extraneous “?” </li></ul><ul><li>Lineage-minimality: harder </li></ul><ul><li>Membership: tuple and table, some-instance and all-instances </li></ul><ul><li>Coexistence: multiple tuples </li></ul><ul><li>Extraction: remove tables, retain possible-instances </li></ul>
  17. 17. Example: Extraneous Data extraneous ? ? ? (Diane, Mazda) ∥ (Diane, Acura) Diane (Diane, Mazda) (Diane, Acura)
  18. 18. Example: Coexistence ? ? ? ? Can’t coexist Mazda Acura (Diane, Mazda) ∥ (Diane, Acura) (Diane, Mazda) (Diane, Acura)
  19. 19. Querying ULDBs: Semantics <ul><li>Query Q on ULDB D </li></ul>D D 1 , D 2 , …, D n possible instances Q on each instance representation of instances Q(D 1 ), Q(D 2 ), …, Q ( D n ) D’ implementation of Q operational semantics D + Result
  20. 20. Querying ULDBs: TriQL <ul><li>Basic TriQL: SQL with new semantics </li></ul><ul><ul><li>Obeys commutative diagram for uncertain data </li></ul></ul><ul><ul><li>Tracks lineage </li></ul></ul><ul><ul><li>Query results: new table or on-the-fly </li></ul></ul><ul><li>Implemented TriQL: also built-in predicates conf(), lineage(), lineage*() </li></ul>
  21. 21. Additional TriQL Constructs <ul><li>[Language manual on web site] </li></ul><ul><li>“ Horizontal subqueries” </li></ul><ul><ul><li>Refer to tuple alternatives as a relation </li></ul></ul><ul><li>Unmerged (horizontal duplicates) </li></ul><ul><li>Flatten , GroupAlts </li></ul><ul><li>NoLineage , NoConf , NoMaybe </li></ul><ul><li>Query-specified confidences [done] </li></ul><ul><li>Data modification statements </li></ul>
  22. 22. Confidence Computation <ul><li>Confidences computed on-demand based on lineage </li></ul><ul><ul><li>Confidence of alternative A is function of confidences in λ * ( A ) </li></ul></ul><ul><ul><li>Permits any query plan for data computation </li></ul></ul><ul><li>Default probabilistic interpretation, but queries can override </li></ul>SELECT person, min(conf(Saw),conf(Drives)) as conf FROM Saw, Drives WHERE =
  23. 23. Trio System: Version 1 Standard relational DBMS Trio API and translator (Python) Command-line client Trio Metadata TrioExplorer (GUI client) Trio Stored Procedures Encoded Data Tables Lineage Tables Standard SQL <ul><li>“ Verticalize” </li></ul><ul><li>Shared IDs for </li></ul><ul><li>alternatives </li></ul><ul><li>Columns for </li></ul><ul><li>confidence,“?” </li></ul><ul><li>One per result </li></ul><ul><li>table </li></ul><ul><li>Uses unique IDs </li></ul><ul><li>Table types </li></ul><ul><li>Schema-level </li></ul><ul><li>lineage structure </li></ul><ul><li>conf() </li></ul><ul><li>lineage() “==>” </li></ul><ul><li>lineage*() “==>>” </li></ul><ul><li>DDL commands </li></ul><ul><li>TriQL queries </li></ul><ul><li>Schema browsing </li></ul><ul><li>Table browsing </li></ul><ul><li>Explore lineage </li></ul><ul><li>On-demand </li></ul><ul><li>confidence </li></ul><ul><li>computation </li></ul>
  24. 24. Current & Future Topics <ul><li>Algorithms: confidence computation, coexistence </li></ul><ul><li>extraneous data </li></ul><ul><ul><li>Minimize lineage traversal </li></ul></ul><ul><ul><li>Memoization </li></ul></ul><ul><ul><li>Batch operations </li></ul></ul><ul><li>System </li></ul><ul><ul><li>Full query language </li></ul></ul><ul><ul><li>More internal processing ? </li></ul></ul><ul><ul><ul><li>Storage and indexing </li></ul></ul></ul><ul><ul><ul><li>Statistics and query optimization </li></ul></ul></ul>
  25. 25. Current & Future Topics <ul><li>Top- K by confidence </li></ul><ul><li>Extend basic uncertainty model </li></ul><ul><ul><li>Incomplete relations </li></ul></ul><ul><ul><li>Continuous uncertainty </li></ul></ul><ul><ul><li>Correlated uncertainty ? </li></ul></ul><ul><li>External lineage, </li></ul><ul><li>update lineage, </li></ul><ul><li>versioning </li></ul>