Mazda R27 1

380 views
295 views

Published on

Experience Mazda Zoom Zoom Lifestyle and Culture by Visiting and joining the Official Mazda Community at http://www.MazdaCommunity.org for additional insight into the Zoom Zoom Lifestyle and special offers for Mazda Community Members. If you live in Arizona, check out CardinaleWay Mazda's eCommerce website at http://www.Cardinale-Way-Mazda.com

Published in: Automotive, Sports, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
380
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Say that work done in the context of the Trio project at Stanford
  • Mention “at this point in time”, no conventional DBMS supports uncertainty or lineage, much less the two together.
  • Mention that people can find more discussion of applications in original vision paper for the project
  • Say what lineage is (briefly) Explain some motivating applications
  • Say that you are going to substantiate the claim. Emphasize on the following slide “here are the specific technical substantiations I am going to give”.
  • Make sure to say I’ll substantiate each statement technically.
  • Mention that ULDB forms the basis of the Trio Project
  • Say that Saw and Drives could be uncertain, and suspects could be uncertain and has lineage.
  • Say that an instance is a certain database.
  • Emphasize again that ULDBs are our model and basis of the Trio project. Emphasize that these are simple constructs.
  • Or-sets: We can abbreviate but it is more general the way it is.
  • Mention independence
  • Default probabilistic interpretation, that confidences add up to <=1, and if <1, there’s implicit “?”. Then say that will talk in little more depth about confidences toward the end of the talk.
  • Say it correctly!
  • Mention verbally that talk is about internal lineage only.
  • Mention that have added IDs to the data. I went too slow. Our semantics, once we have lineage, doesn’t allow in-consistent instances.
  • Important to explain: keep base data with query results (emphasize a bit more) IMPORTANT--------  SAY CERTAIN CLASS OF QUERIES!!!!!!! INTUITIVELY WHAT KINDA CLASS  -------------------
  • Model can allow lineage all over the place. In reality there is a certain structure to lineage (which makes other properties nice). We define a more restricted kind of linegae in the normal case. Let me give an intuition of that. Say that we work with well-behaved ULDBs.
  • Mention verbally why we care about minimality. Say less about minimality: Just say the paper as some results and also poses one open problem.
  • Mention the example is very simple.
  • You'll want to use it to recap what you've already said and what properties are retained by what operations, plus to introduce two additional operations you're going to discuss.
  • Mention very early that Extraction is trivial in a conventional database by just removing relations, but poses interesting problem in ULDBs because of dependence of relations on other relations in lineage. In other words, possible instance of R may be determined by the possible instances in T and V, as saw in our original join example that demonstrated closure.
  • “Let me just say a little more about the interesting topic about confidences…”. No need to read confidence values loud. Can motivate “custom” confidences by saying something like base data might be correlation (e.g., sensors) and u need things outside DB to compute conf.
  • Mention more details in the paper.
  • Mention that all the algorithms are closely related. (The extraneous data removal and membership are also confidence questions.)
  • Say, “now let’s look at current work in the Trio project as a whole” To conclude let me give you the big picture of the project... Mention that this paper+talk is mainly about the data model After ULDBs: add “(coming…. correlations, continuous…)
  • Say here’s a small sample, and there’s more in DB and AI, see the paper.
  • Mazda R27 1

    1. 1. Uncertainty Lineage Data Bases Very Large Data Bases 1975 2006
    2. 2. ULDBs: Databases with Uncertainty and Lineage Omar Benjelloun, Anish Das Sarma , Alon Halevy, Jennifer Widom Stanford InfoLab DATA UNCERTAINTY LINEAGE
    3. 3. Mot iv ation <ul><li>Many applications involve data that is uncertain (approximate, probabilistic, inexact, incomplete, </li></ul><ul><li>imprecise, fuzzy, inaccurate, ...) </li></ul><ul><li>Many of the same applications need to track the lineage of their data </li></ul><ul><li>Neither uncertainty nor lineage are supported by conventional DBMSs </li></ul>Coincidence or Fate?
    4. 4. Sample Applications Needing Uncertainty and Lineage <ul><li>Scientific databases </li></ul><ul><li>Sensor databases </li></ul><ul><li>Data cleaning </li></ul><ul><li>Data integration </li></ul><ul><li>Information extraction </li></ul>
    5. 5. Trio Project <ul><li>Building a new kind of DBMS in which: </li></ul><ul><ul><li>Data </li></ul></ul><ul><ul><li>Uncertainty </li></ul></ul><ul><ul><li>Lineage </li></ul></ul><ul><li>are all first-class interrelated concepts </li></ul>
    6. 6. Lineage and Uncertainty <ul><li>Lots of independent work in lineage and uncertainty (related work at end of talk) </li></ul><ul><li>Turns out: The connection between uncertainty and lineage goes deeper than just a shared need by several applications </li></ul>Coincidence or Fate?
    7. 7. Lineage and Uncertainty <ul><li>Lineage... </li></ul><ul><ul><li>Enables simple and consistent representation of uncertain data </li></ul></ul><ul><ul><li>Correlates uncertainty in query results with uncertainty in the input data </li></ul></ul><ul><ul><li>Can make computation over uncertain data more efficient </li></ul></ul>
    8. 8. Outline of the Talk <ul><li>The ULDB data model </li></ul><ul><li>Querying ULDBs </li></ul><ul><li>ULDB properties </li></ul><ul><li>Membership and extraction operations </li></ul><ul><li>Confidences </li></ul><ul><li>Current, related, and future work </li></ul>
    9. 9. Running Example: Crime Solver <ul><li>Saw( witness,car ) </li></ul><ul><li>Drives( person,car ) </li></ul><ul><li>Suspects( person ) = π person (Saw ⋈ Drives) </li></ul>
    10. 10. Uncertainty <ul><li>An uncertain database represents a set of possible instances . Examples: </li></ul><ul><ul><li>Amy saw either a Honda or a Toyota </li></ul></ul><ul><ul><li>Jimmy drives a Toyota, a Mazda, or both </li></ul></ul><ul><ul><li>Betty saw an Acura with confidence 0.5 or a Toyota with confidence 0.3 </li></ul></ul><ul><ul><li>Hank is a suspect with confidence 0.7 </li></ul></ul>
    11. 11. Uncertainty in a ULDB <ul><li>1. Alternatives </li></ul><ul><li>2. ‘?’ (Maybe) Annotations </li></ul><ul><li>3. Confidences </li></ul>
    12. 12. Uncertainty in a ULDB <ul><li>1. Alternatives: uncertainty about value </li></ul><ul><li>2. ‘?’ (Maybe) Annotations </li></ul><ul><li>3. Confidences </li></ul>= (Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda) Saw (witness,car) Three possible instances { Honda, Toyota, Mazda } car Amy witness
    13. 13. Uncertainty in a ULDB <ul><li>1. Alternatives </li></ul><ul><li>2. ‘?’ (Maybe): uncertainty about presence </li></ul><ul><li>3. Confidences </li></ul>Six possible instances ? (Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda) (Betty, Acura) Saw (witness,car)
    14. 14. Uncertainty in a ULDB <ul><li>1. Alternatives </li></ul><ul><li>2. ‘?’ (Maybe) Annotations </li></ul><ul><li>3. Confidences: weighted uncertainty </li></ul>? Six possible instances, each with a probability (Amy, Honda): 0.5 ∥ (Amy,Toyota): 0.3 ∥ (Amy, Mazda): 0.2 (Betty, Acura): 0.6 Saw (witness,car)
    15. 15. Data Models for Uncertainty <ul><li>Our model (so far) is not especially new </li></ul><ul><li>We spent some time exploring the space of models for uncertainty [ICDE 2006] </li></ul><ul><li>Tension between understandability and expressiveness </li></ul><ul><ul><li>Our model is understandable </li></ul></ul><ul><ul><li>But it is not complete , or even closed under common operations </li></ul></ul>
    16. 16. Closure and Completeness <ul><li>Completeness </li></ul><ul><li>Can represent all sets of possible instances </li></ul><ul><li>Closure </li></ul><ul><li>Can represent results of operations </li></ul><ul><li>Note: Completeness  Closure </li></ul>
    17. 17. Model (so far) Not Closed Suspects = π person (Saw ⋈ Drives) ? ? ? Does not correctly capture possible instances in the result CANNOT (Cathy, Honda) ∥ (Cathy, Mazda) Saw (witness,car) (Billy, Honda) ∥ (Frank, Honda) (Hank, Honda) (Jimmy, Toyota) ∥ (Jimmy, Mazda) Drives (person,car) Jimmy Billy ∥ Frank Hank Suspects
    18. 18. Lineage to the Rescue <ul><li>Lineage: “where data came from” </li></ul><ul><ul><li>Internal lineage </li></ul></ul><ul><ul><li>External lineage (not covered in this talk) </li></ul></ul><ul><li>In ULDBs: A function λ from alternatives to sets of alternatives (or external sources) </li></ul>
    19. 19. Example with Lineage ? ? ? Suspects = π person (Saw ⋈ Drives) λ (31) = (11,2),(21,2) λ (32,1) = (11,1),(22,1); λ (32,2) = (11,1),(22,2) λ (33) = (11,1), 23 11 ID (Cathy, Honda) ∥ (Cathy, Mazda) Saw (witness,car) 23 22 21 ID (Billy, Honda) ∥ (Frank, Honda) (Hank, Honda) (Jimmy, Toyota) ∥ (Jimmy, Mazda) Drives (person,car) 33 32 31 ID Jimmy Billy ∥ Frank Hank Suspects Correctly captures possible instances in the result
    20. 20. ULDBs <ul><li>Alternatives </li></ul><ul><li>‘ ?’ (Maybe) Annotations </li></ul><ul><li>Confidences </li></ul><ul><li>Lineage </li></ul>ULDBs are Closed and Complete
    21. 21. Outline of the Talk <ul><li>The ULDB data model </li></ul><ul><li>Querying ULDBs </li></ul><ul><li>ULDB properties </li></ul><ul><li>Membership and extraction operations </li></ul><ul><li>Confidences </li></ul><ul><li>Current, related, and future work </li></ul>
    22. 22. Querying ULDBs <ul><li>Query Q on ULDB D </li></ul>D D 1 , D 2 , …, D n possible instances Q on each instance representation of instances Q(D 1 ), Q(D 2 ), …, Q ( D n ) D’ implementation of Q D + Result
    23. 23. Well-Behaved ULDBs <ul><li>If we start with a well-behaved ULDB and perform standard queries, it remains well-behaved </li></ul><ul><li>Intuitively (details in paper): </li></ul><ul><ul><li>Acyclic: No cycles in the lineage </li></ul></ul><ul><ul><li>Deterministic: Non-empty lineages of distinct alternatives are distinct </li></ul></ul><ul><ul><li>Uniform: Alternatives of same tuple are derived from the same set of tuples </li></ul></ul>
    24. 24. ULDB Minimality <ul><li>Data-minimality </li></ul><ul><ul><li>Does every alternative appear in some possible instance ? (no extraneous alternatives) </li></ul></ul><ul><ul><li>Does every maybe-tuple in R not appear in some possible instance? (no extraneous ‘?’s) </li></ul></ul><ul><li>Lineage-minimality </li></ul>
    25. 25. Data-Minimality Examples <ul><li>Extraneous ‘?’ </li></ul>? λ (20,1)=(10,1); λ (20,2)=(10,2) extraneous 10 (Billy, Honda) ∥ (Frank, Honda) . . . . . . 20 . . . Billy ∥ Frank . . .
    26. 26. Data-Minimality Examples <ul><li>Extraneous alternative </li></ul>extraneous ? ? ? (Diane, Mazda) ∥ (Diane, Acura) Diane (Diane, Mazda) (Diane, Acura)
    27. 27. Data-Minimization <ul><li>Extraneous alternative theorem: </li></ul><ul><ul><li>An alternative is extraneous iff it is (possibly transitively) derived from multiple alternatives of the same tuple. </li></ul></ul><ul><li>Extraneous “?” theorem </li></ul><ul><ul><li>A “?” on tuple t is extraneous iff </li></ul></ul><ul><ul><ul><li>it is derived from base tuples without “?” </li></ul></ul></ul><ul><ul><ul><li>t has as many alternatives as the product of the number in its base tuples </li></ul></ul></ul><ul><li>Minimization algorithm based on the theorems </li></ul><ul><li>(see paper) </li></ul>
    28. 28. ULDB Properties and Operations Data-minimal Lineage-minimal Lineage-minimal Data-minimal Data-minimize Queries Lineage-minimize Membership Extraction
    29. 29. Membership Questions <ul><li>Does a given tuple t appear in some (all) possible instance (s) of R ? </li></ul><ul><ul><li>Polynomial algorithms based on Data-minimization </li></ul></ul><ul><li>Is a given table T one of (all of) the possible instances of R ? </li></ul><ul><ul><li>NP-Hard </li></ul></ul>t? , T? R I 1 , I 2 , …, I n possible instances
    30. 30. Extraction <ul><li>Extraction algorithm in paper </li></ul>Suspects Eats Saw Drives
    31. 31. Outline of the Talk <ul><li>The ULDB data model </li></ul><ul><li>Querying ULDBs </li></ul><ul><li>ULDB properties </li></ul><ul><li>Membership and extraction operations </li></ul><ul><li>Confidences </li></ul><ul><li>Current, related, and future work </li></ul>
    32. 32. Confidences <ul><li>Confidences supplied with base data </li></ul><ul><li>Trio computes confidences on query results </li></ul><ul><ul><li>Default probabilistic interpretation </li></ul></ul><ul><ul><li>Can choose to plug in different arithmetic </li></ul></ul>? ? ? 0.3 0.4 0.6 Probabilistic Min (Cathy, Honda): 0.6 ∥ (Cathy, Mazda): 0.4 Saw (witness,car) (Hank, Honda): 1.0 (Jimmy, Mazda): 0.3 ∥ (Bill, Mazda): 0.6 Drives (person,car) Jimmy: 0.12 ∥ Bill: 0.24 Hank: 0.6 Suspects
    33. 33. Query Processing with Confidences <ul><li>Previous approach (probabilistic databases) </li></ul><ul><ul><li>Each operator computes confidences during query execution </li></ul></ul><ul><ul><li>Only certain query plans allowed </li></ul></ul><ul><li>In ULDBs </li></ul><ul><ul><li>Confidence of alternative A is function of confidences in its transitive lineage </li></ul></ul><ul><li>Our approach: Decouple data and confidence computation </li></ul><ul><ul><li>Use any query plan for data computation </li></ul></ul><ul><ul><li>Compute confidences on-demand using lineage </li></ul></ul><ul><ul><li>Can give arbitrarily large improvements </li></ul></ul>
    34. 34. Current Work: Algorithms <ul><li>Algorithms: confidence computation, extraneous data, membership questions </li></ul><ul><ul><li>Minimize lineage traversal </li></ul></ul><ul><ul><li>Memoization </li></ul></ul><ul><ul><li>Batch computations </li></ul></ul>
    35. 35. The Trio Trio <ul><li>Data Model </li></ul><ul><ul><li>ULDBs (Coming: incomplete relations; continuous uncertainty; correlation uncertainty) </li></ul></ul><ul><li>Query Language </li></ul><ul><ul><li>Simple extension to SQL </li></ul></ul><ul><ul><li>Query uncertainty, confidences, and lineage </li></ul></ul><ul><li>System </li></ul><ul><ul><li>Did you see our demo?  </li></ul></ul><ul><ul><li>Version 1: Entirely on top of conventional DBMS </li></ul></ul><ul><ul><li>Surprisingly easy and complete, reasonably efficient </li></ul></ul>TriQL
    36. 36. Brief Related Work <ul><li>Uncertainty </li></ul><ul><ul><li>Modeling </li></ul></ul><ul><ul><ul><li>C-tables [IL84], Probabilistic Databases [CP87], using Nested Relations [F90] </li></ul></ul></ul><ul><ul><li>Systems </li></ul></ul><ul><ul><ul><li>ProbView [LLRS97], MYSTIQ [BDM+05], ORION [CSP05], Trio [BDHW05] </li></ul></ul></ul><ul><li>Lineage </li></ul><ul><ul><li>DBNotes [CTV05], Data Warehouses [CW03] </li></ul></ul>
    37. 37. but don’t forget the lineage… Thank You Search “stanford trio” (or, http://i.stanford.edu/trio) DATA UNCERTAINTY LINEAGE

    ×