Trio Meeting


Published on

Published in: Technology, Sports
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Trio Meeting

  1. 1. Trio: A System for Data, Uncertainty, and Lineage Search “stanford trio” DATA UNCERTAINTY LINEAGE
  2. 2. People <ul><li>Current </li></ul><ul><ul><li>Jennifer Widom (faculty) </li></ul></ul><ul><ul><li>Omar Benjelloun (post-doc) </li></ul></ul><ul><ul><li>Parag Agrawal, Anish Das Sarma, Shubha Nabar (PhD) </li></ul></ul><ul><ul><li>Michi Mutsuzaki (MS) </li></ul></ul><ul><ul><li>Tomoe Sugihara (visitor) </li></ul></ul><ul><li>Incoming </li></ul><ul><ul><li>Martin Theobald (post-doc) </li></ul></ul><ul><ul><li>Raghu Murthy (MS) </li></ul></ul><ul><ul><li>Ander de Keijzer (visitor) </li></ul></ul><ul><li>Alums </li></ul><ul><ul><li>Alon Halevy, Ashok Chandra (visitors) </li></ul></ul><ul><ul><li>Chris Hayworth (MS) </li></ul></ul>
  3. 3. Why Uncertainty + Lineage? <ul><li>Many applications seem to need both </li></ul><ul><li>From a technical standpoint, it turns out that </li></ul><ul><li>lineage... </li></ul><ul><ul><li>Enables simple and consistent representation of uncertain data </li></ul></ul><ul><ul><li>Correlates uncertainty in query results with uncertainty in the input data </li></ul></ul><ul><ul><li>Can make computation over uncertain data more efficient </li></ul></ul>
  4. 4. Trio Components <ul><li>Data Model </li></ul><ul><ul><li>ULDBs (Uncertainty-Lineage Databases): </li></ul></ul><ul><ul><li>Simple extension to relational model </li></ul></ul><ul><li>Query Language </li></ul><ul><ul><li>TriQL: Simple extension to SQL, well-defined semantics and intuitive behavior </li></ul></ul><ul><li>System </li></ul><ul><ul><li>Version 1: Complete system and GUI built on top of conventional DBMS </li></ul></ul>
  5. 5. Running Example: Crime-Solving <ul><li>Saw( witness , car ) // may be uncertain </li></ul><ul><li>Drives( person , car ) // may be uncertain </li></ul><ul><li>Suspects( person ) = π person (Saw ⋈ Drives) </li></ul>
  6. 6. Our Model for Uncertainty <ul><li>1. Alternatives </li></ul><ul><li>2. ‘?’ (Maybe) Annotations </li></ul><ul><li>3. Confidences </li></ul>
  7. 7. Our Model for Uncertainty <ul><li>1. Alternatives: uncertainty about value </li></ul><ul><li>2. ‘?’ (Maybe) Annotations </li></ul><ul><li>3. Confidences </li></ul>= Three possible instances (Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda) Saw (witness,car) { Honda, Toyota, Mazda } car Amy witness
  8. 8. Our Model for Uncertainty <ul><li>1. Alternatives </li></ul><ul><li>2. ‘?’ (Maybe): uncertainty about presence </li></ul><ul><li>3. Confidences </li></ul>Six possible instances ? (Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda) (Betty, Acura) Saw (witness,car)
  9. 9. Our Model for Uncertainty <ul><li>1. Alternatives </li></ul><ul><li>2. ‘?’ (Maybe) Annotations </li></ul><ul><li>3. Confidences: weighted uncertainty </li></ul>? Six possible instances, each with a probability (Amy, Honda): 0.5 ∥ (Amy,Toyota): 0.3 ∥ (Amy, Mazda): 0.2 (Betty, Acura): 0.6 Saw (witness,car)
  10. 10. Models for Uncertainty <ul><li>Our model (so far) is not especially new </li></ul><ul><li>We spent some time exploring the space of models for uncertainty [ICDE 06, journal] </li></ul><ul><li>Tension between understandability and expressiveness </li></ul><ul><ul><li>Our model is understandable </li></ul></ul><ul><ul><li>But it is not complete, or even closed under common operations </li></ul></ul>
  11. 11. Our Model is Not Closed Suspects = π person (Saw ⋈ Drives) ? ? ? Does not correctly capture possible instances in the result CANNOT (Cathy, Honda) ∥ (Cathy, Mazda) Saw (witness,car) (Billy, Honda) ∥ (Frank, Honda) (Hank, Honda) (Jimmy, Toyota) ∥ (Jimmy, Mazda) Drives (person,car) Jimmy Billy ∥ Frank Hank Suspects
  12. 12. Lineage to the Rescue <ul><li>Lineage </li></ul><ul><ul><li>Captures “where data came from” </li></ul></ul><ul><ul><li>In Trio: A function λ from alternatives to other alternatives (or external sources) </li></ul></ul>
  13. 13. Example with Lineage ? ? ? Suspects = π person (Saw ⋈ Drives) λ (31) = (11,2),(21,2) λ (32,1) = (11,1),(22,1); λ (32,2) = (11,1),(22,2) λ (33) = (11,1), 23 11 ID (Cathy, Honda) ∥ (Cathy, Mazda) Saw (witness,car) 23 22 21 ID (Billy, Honda) ∥ (Frank, Honda) (Hank, Honda) (Jimmy, Toyota) ∥ (Jimmy, Mazda) Drives (person,car) 33 32 31 ID Jimmy Billy ∥ Frank Hank Suspects Correctly captures possible instances in the result
  14. 14. Uncertainty-Lineage Databases (ULDBs) <ul><li>1. Alternatives </li></ul><ul><li>2. ‘?’ (Maybe) Annotations </li></ul><ul><li>3. Confidences </li></ul><ul><li>4. Lineage </li></ul><ul><li>ULDBs are closed and complete </li></ul><ul><li>[VLDB 06] </li></ul>
  15. 15. ULDBs: Lineage <ul><li>Conjunctive lineage sufficient for most operations </li></ul><ul><li>Duplicate-elimination: Disjunctive lineage </li></ul><ul><li>Difference: Negative lineage </li></ul><ul><li>General case after multiple operations/queries: Boolean formula </li></ul>
  16. 16. ULDBs: Interesting Questions <ul><li>Data-minimality: extraneous alternatives, extraneous “?” </li></ul><ul><li>Lineage-minimality: harder </li></ul><ul><li>Membership: tuple and table, some-instance and all-instances </li></ul><ul><li>Coexistence: multiple tuples </li></ul><ul><li>Extraction: remove tables, retain possible-instances </li></ul>
  17. 17. Example: Extraneous Data extraneous ? ? ? (Diane, Mazda) ∥ (Diane, Acura) Diane (Diane, Mazda) (Diane, Acura)
  18. 18. Example: Coexistence ? ? ? ? Can’t coexist Mazda Acura (Diane, Mazda) ∥ (Diane, Acura) (Diane, Mazda) (Diane, Acura)
  19. 19. Querying ULDBs: Semantics <ul><li>Query Q on ULDB D </li></ul>D D 1 , D 2 , …, D n possible instances Q on each instance representation of instances Q(D 1 ), Q(D 2 ), …, Q ( D n ) D’ implementation of Q operational semantics D + Result
  20. 20. Querying ULDBs: TriQL <ul><li>Basic TriQL: SQL with new semantics </li></ul><ul><ul><li>Obeys commutative diagram for uncertain data </li></ul></ul><ul><ul><li>Tracks lineage </li></ul></ul><ul><ul><li>Query results: new table or on-the-fly </li></ul></ul><ul><li>Implemented TriQL: also built-in predicates conf(), lineage(), lineage*() </li></ul>
  21. 21. Additional TriQL Constructs <ul><li>[Language manual on web site] </li></ul><ul><li>“ Horizontal subqueries” </li></ul><ul><ul><li>Refer to tuple alternatives as a relation </li></ul></ul><ul><li>Unmerged (horizontal duplicates) </li></ul><ul><li>Flatten , GroupAlts </li></ul><ul><li>NoLineage , NoConf , NoMaybe </li></ul><ul><li>Query-specified confidences [done] </li></ul><ul><li>Data modification statements </li></ul>
  22. 22. Confidence Computation <ul><li>Confidences computed on-demand based on lineage </li></ul><ul><ul><li>Confidence of alternative A is function of confidences in λ * ( A ) </li></ul></ul><ul><ul><li>Permits any query plan for data computation </li></ul></ul><ul><li>Default probabilistic interpretation, but queries can override </li></ul>SELECT person, min(conf(Saw),conf(Drives)) as conf FROM Saw, Drives WHERE =
  23. 23. Trio System: Version 1 Standard relational DBMS Trio API and translator (Python) Command-line client Trio Metadata TrioExplorer (GUI client) Trio Stored Procedures Encoded Data Tables Lineage Tables Standard SQL <ul><li>“ Verticalize” </li></ul><ul><li>Shared IDs for </li></ul><ul><li>alternatives </li></ul><ul><li>Columns for </li></ul><ul><li>confidence,“?” </li></ul><ul><li>One per result </li></ul><ul><li>table </li></ul><ul><li>Uses unique IDs </li></ul><ul><li>Table types </li></ul><ul><li>Schema-level </li></ul><ul><li>lineage structure </li></ul><ul><li>conf() </li></ul><ul><li>lineage() “==>” </li></ul><ul><li>lineage*() “==>>” </li></ul><ul><li>DDL commands </li></ul><ul><li>TriQL queries </li></ul><ul><li>Schema browsing </li></ul><ul><li>Table browsing </li></ul><ul><li>Explore lineage </li></ul><ul><li>On-demand </li></ul><ul><li>confidence </li></ul><ul><li>computation </li></ul>
  24. 24. Current & Future Topics <ul><li>Algorithms: confidence computation, coexistence </li></ul><ul><li>extraneous data </li></ul><ul><ul><li>Minimize lineage traversal </li></ul></ul><ul><ul><li>Memoization </li></ul></ul><ul><ul><li>Batch operations </li></ul></ul><ul><li>System </li></ul><ul><ul><li>Full query language </li></ul></ul><ul><ul><li>More internal processing ? </li></ul></ul><ul><ul><ul><li>Storage and indexing </li></ul></ul></ul><ul><ul><ul><li>Statistics and query optimization </li></ul></ul></ul>
  25. 25. Current & Future Topics <ul><li>Top- K by confidence </li></ul><ul><li>Extend basic uncertainty model </li></ul><ul><ul><li>Incomplete relations </li></ul></ul><ul><ul><li>Continuous uncertainty </li></ul></ul><ul><ul><li>Correlated uncertainty ? </li></ul></ul><ul><li>External lineage, </li></ul><ul><li>update lineage, </li></ul><ul><li>versioning </li></ul>