Your SlideShare is downloading. ×
Trio Meeting
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Trio Meeting

195
views

Published on

Published in: Technology, Sports

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
195
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Trio: A System for Data, Uncertainty, and Lineage Search “stanford trio” http://i.stanford.edu/trio DATA UNCERTAINTY LINEAGE
  • 2. People
    • Current
      • Jennifer Widom (faculty)
      • Omar Benjelloun (post-doc)
      • Parag Agrawal, Anish Das Sarma, Shubha Nabar (PhD)
      • Michi Mutsuzaki (MS)
      • Tomoe Sugihara (visitor)
    • Incoming
      • Martin Theobald (post-doc)
      • Raghu Murthy (MS)
      • Ander de Keijzer (visitor)
    • Alums
      • Alon Halevy, Ashok Chandra (visitors)
      • Chris Hayworth (MS)
  • 3. Why Uncertainty + Lineage?
    • Many applications seem to need both
    • From a technical standpoint, it turns out that
    • lineage...
      • Enables simple and consistent representation of uncertain data
      • Correlates uncertainty in query results with uncertainty in the input data
      • Can make computation over uncertain data more efficient
  • 4. Trio Components
    • Data Model
      • ULDBs (Uncertainty-Lineage Databases):
      • Simple extension to relational model
    • Query Language
      • TriQL: Simple extension to SQL, well-defined semantics and intuitive behavior
    • System
      • Version 1: Complete system and GUI built on top of conventional DBMS
  • 5. Running Example: Crime-Solving
    • Saw( witness , car ) // may be uncertain
    • Drives( person , car ) // may be uncertain
    • Suspects( person ) = π person (Saw ⋈ Drives)
  • 6. Our Model for Uncertainty
    • 1. Alternatives
    • 2. ‘?’ (Maybe) Annotations
    • 3. Confidences
  • 7. Our Model for Uncertainty
    • 1. Alternatives: uncertainty about value
    • 2. ‘?’ (Maybe) Annotations
    • 3. Confidences
    = Three possible instances (Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda) Saw (witness,car) { Honda, Toyota, Mazda } car Amy witness
  • 8. Our Model for Uncertainty
    • 1. Alternatives
    • 2. ‘?’ (Maybe): uncertainty about presence
    • 3. Confidences
    Six possible instances ? (Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda) (Betty, Acura) Saw (witness,car)
  • 9. Our Model for Uncertainty
    • 1. Alternatives
    • 2. ‘?’ (Maybe) Annotations
    • 3. Confidences: weighted uncertainty
    ? Six possible instances, each with a probability (Amy, Honda): 0.5 ∥ (Amy,Toyota): 0.3 ∥ (Amy, Mazda): 0.2 (Betty, Acura): 0.6 Saw (witness,car)
  • 10. Models for Uncertainty
    • Our model (so far) is not especially new
    • We spent some time exploring the space of models for uncertainty [ICDE 06, journal]
    • Tension between understandability and expressiveness
      • Our model is understandable
      • But it is not complete, or even closed under common operations
  • 11. Our Model is Not Closed Suspects = π person (Saw ⋈ Drives) ? ? ? Does not correctly capture possible instances in the result CANNOT (Cathy, Honda) ∥ (Cathy, Mazda) Saw (witness,car) (Billy, Honda) ∥ (Frank, Honda) (Hank, Honda) (Jimmy, Toyota) ∥ (Jimmy, Mazda) Drives (person,car) Jimmy Billy ∥ Frank Hank Suspects
  • 12. Lineage to the Rescue
    • Lineage
      • Captures “where data came from”
      • In Trio: A function λ from alternatives to other alternatives (or external sources)
  • 13. Example with Lineage ? ? ? Suspects = π person (Saw ⋈ Drives) λ (31) = (11,2),(21,2) λ (32,1) = (11,1),(22,1); λ (32,2) = (11,1),(22,2) λ (33) = (11,1), 23 11 ID (Cathy, Honda) ∥ (Cathy, Mazda) Saw (witness,car) 23 22 21 ID (Billy, Honda) ∥ (Frank, Honda) (Hank, Honda) (Jimmy, Toyota) ∥ (Jimmy, Mazda) Drives (person,car) 33 32 31 ID Jimmy Billy ∥ Frank Hank Suspects Correctly captures possible instances in the result
  • 14. Uncertainty-Lineage Databases (ULDBs)
    • 1. Alternatives
    • 2. ‘?’ (Maybe) Annotations
    • 3. Confidences
    • 4. Lineage
    • ULDBs are closed and complete
    • [VLDB 06]
  • 15. ULDBs: Lineage
    • Conjunctive lineage sufficient for most operations
    • Duplicate-elimination: Disjunctive lineage
    • Difference: Negative lineage
    • General case after multiple operations/queries: Boolean formula
  • 16. ULDBs: Interesting Questions
    • Data-minimality: extraneous alternatives, extraneous “?”
    • Lineage-minimality: harder
    • Membership: tuple and table, some-instance and all-instances
    • Coexistence: multiple tuples
    • Extraction: remove tables, retain possible-instances
  • 17. Example: Extraneous Data extraneous ? ? ? (Diane, Mazda) ∥ (Diane, Acura) Diane (Diane, Mazda) (Diane, Acura)
  • 18. Example: Coexistence ? ? ? ? Can’t coexist Mazda Acura (Diane, Mazda) ∥ (Diane, Acura) (Diane, Mazda) (Diane, Acura)
  • 19. Querying ULDBs: Semantics
    • Query Q on ULDB D
    D D 1 , D 2 , …, D n possible instances Q on each instance representation of instances Q(D 1 ), Q(D 2 ), …, Q ( D n ) D’ implementation of Q operational semantics D + Result
  • 20. Querying ULDBs: TriQL
    • Basic TriQL: SQL with new semantics
      • Obeys commutative diagram for uncertain data
      • Tracks lineage
      • Query results: new table or on-the-fly
    • Implemented TriQL: also built-in predicates conf(), lineage(), lineage*()
  • 21. Additional TriQL Constructs
    • [Language manual on web site]
    • “ Horizontal subqueries”
      • Refer to tuple alternatives as a relation
    • Unmerged (horizontal duplicates)
    • Flatten , GroupAlts
    • NoLineage , NoConf , NoMaybe
    • Query-specified confidences [done]
    • Data modification statements
  • 22. Confidence Computation
    • Confidences computed on-demand based on lineage
      • Confidence of alternative A is function of confidences in λ * ( A )
      • Permits any query plan for data computation
    • Default probabilistic interpretation, but queries can override
    SELECT person, min(conf(Saw),conf(Drives)) as conf FROM Saw, Drives WHERE Saw.car = Drives.car
  • 23. Trio System: Version 1 Standard relational DBMS Trio API and translator (Python) Command-line client Trio Metadata TrioExplorer (GUI client) Trio Stored Procedures Encoded Data Tables Lineage Tables Standard SQL
    • “ Verticalize”
    • Shared IDs for
    • alternatives
    • Columns for
    • confidence,“?”
    • One per result
    • table
    • Uses unique IDs
    • Table types
    • Schema-level
    • lineage structure
    • conf()
    • lineage() “==>”
    • lineage*() “==>>”
    • DDL commands
    • TriQL queries
    • Schema browsing
    • Table browsing
    • Explore lineage
    • On-demand
    • confidence
    • computation
  • 24. Current & Future Topics
    • Algorithms: confidence computation, coexistence
    • extraneous data
      • Minimize lineage traversal
      • Memoization
      • Batch operations
    • System
      • Full query language
      • More internal processing ?
        • Storage and indexing
        • Statistics and query optimization
  • 25. Current & Future Topics
    • Top- K by confidence
    • Extend basic uncertainty model
      • Incomplete relations
      • Continuous uncertainty
      • Correlated uncertainty ?
    • External lineage,
    • update lineage,
    • versioning