Dependable Cardinality Forecasts for XQuery
                   Jens Teubner, ETH (formerly IBM Research)
                 ...
Cardinality Estimation for XQuery
   The feature-richness and semantics of the language make
   cardinality estimation for...
Cardinality Estimation for XQuery
   The feature-richness and semantics of the language make
   cardinality estimation for...
Idea: Perform cardinality estimation on relational plan
      equivalents for XQuery.

                    relational card...
πiter:outer,pos:pos1,item

                                                                                               ...
Relational XQuery Cardinality Estimation
   Apply System R-style estimation to relational XQuery plans, e.g.,

   Disjoint...
Relational XQuery Cardinality Estimation
   Apply System R-style estimation to relational XQuery plans, e.g.,

   Disjoint...
Abstract Domain Identifiers
   A simple form of data flow analysis provides the information
   needed.
             Introdu...
Abstract Domain Identifiers
   Use abstract domain information for cardinality estimation.
   E.g., “foreign key” join:

  ...
Interfacing with XPath—Projection Paths

   Track XPath navigation by means of projection paths3

                        ...
Interfacing with XPath—Cardinality Inference
                                                                             ...
πiter:outer,pos:pos1,item

                                                                                               ...
πiter:outer,pos:pos1,item

                                                                                               ...
πiter:outer,pos:pos1,item

                                                                                               ...
πiter:outer,pos:pos1,item

                                                                                               ...
πiter:outer,pos:pos1,item

                                                                        card2 : 4104           ...
Forecasting New Zealand’s Weather
   The obtained cardinalities can be mapped back to predict item
   counts for correspon...
More Realistic Queries: W3C XQuery Use Cases
                                                                 0.01        ...
Wrap-Up
  Cardinality estimation framework for XQuery
  Subexpression-level estimates for arbitrary XQuery expressions
  B...
Upcoming SlideShare
Loading in...5
×

Dependable Cardinality Forecast for XQuery

327

Published on

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
327
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Dependable Cardinality Forecast for XQuery"

  1. 1. Dependable Cardinality Forecasts for XQuery Jens Teubner, ETH (formerly IBM Research) Torsten Grust, U T¨bingen (formerly TUM) u Sebastian Maneth, UNSW and NICTA Sherif Sakr, UNSW and NICTA c Systems Group — Department of Computer Science — ETH Z¨rich u August 26, 2008
  2. 2. Cardinality Estimation for XQuery The feature-richness and semantics of the language make cardinality estimation for XQuery notoriously hard. for $d in doc ("forecast.xml")/descendant::day let $day := $d/@t let $ppcp := data ($d/descendant::ppcp) return if ($ppcp > 50) then ("rain likely on", $day, "chance of precipitation:", $ppcp) else ("no rain on", $day) for iteration sequence construction conditionals (if-then-else) ... existential quantification (XPath is not a focus of this work.) August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 2
  3. 3. Cardinality Estimation for XQuery The feature-richness and semantics of the language make cardinality estimation for XQuery notoriously hard. for $d in doc ("forecast.xml")/descendant::day card1 = ? let $day := $d/@t let $ppcp := data ($d/descendant::ppcp) return if ($ppcp > 50) card2 = ? then ("rain likely on", $day, card3 = ? "chance of precipitation:", $ppcp) else ("no rain on", $day) card4 = ? for iteration sequence construction conditionals (if-then-else) ... existential quantification (XPath is not a focus of this work.) → Goal: Compute subexpression-level cardinalities cardi . August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 2
  4. 4. Idea: Perform cardinality estimation on relational plan equivalents for XQuery. relational cardinality estimation (System R) + existing work on XPath estimation + histograms for value predicates = cardinality estimation for XQuery Build on Pathfinder’s XQuery-to-relational algebra compiler.1 → tuple count ≡ XQuery item count Cardinality information for each subexpression. 1 http://www.pathfinder-xquery.org/ August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 3
  5. 5. πiter:outer,pos:pos1,item pos1: ord,pos outer Example (Plan details not of interest today.) 2 iter=inner · ∪ 4 1 3 πiter,pos:pos1,item for $d in doc ("forecast.xml")/descendant::day πiter,pos:pos1,item pos1: ord,pos iter let $day := $d/@t pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, let $ppcp := data ($d/descendant::ppcp) · ∪ item:"rain..." · ∪ return 2 πiter,pos:1,ord:1, πiter,pos,item,ord:2 ∪· if ($ppcp > 50) 3 item:"no rain..." πiter,pos:1,ord:3, item:"chance..." then ("rain likely on", $day, πiter,pos,item,ord:2 πiter,pos,item,ord:4 "chance of precipitation:", $ppcp) iter1=iter iter1=iter iter=iter1 else ("no rain on", $day) 4 δ πiter σres res:(item,item1) Pathfinder compiles XQuery iter1=iter πiter1:iter, πiter1:iter,pos,item with arbitrary nesting. πiter1:iter, pos1:1,item1:50 item pos,item pos: item iter Maintain correspondence to πiter pos: item iter item:attribute::t(item) item:descendant::ppcp(item) original query if back-end is πiter:inner,item πinner,outer:iter, ord:pos not relational. inner: iter,pos 1 pos: item iter Derive estimates based on an item:descendant::day(item) inference rule set. docitem πiter,item:"forecast.xml" iter 1
  6. 6. Relational XQuery Cardinality Estimation Apply System R-style estimation to relational XQuery plans, e.g., Disjoint union: Cartesian product: · |q1 ∪ q2 | = |q1 | + |q2 | |q1 × q2 | = |q1 | · |q2 | Equi-join:    |q1 | · |q2 | if there are indexes on  max {|a| , |b| } both join columns,     idx idx  |q1 q2 | = |q1 | · |q2 | if there is only an index a=b |a|idx on column a,        |q1 | · |q2 | · 1/10 otherwise  |c|idx : Number of unique values in index on column c. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 5
  7. 7. Relational XQuery Cardinality Estimation Apply System R-style estimation to relational XQuery plans, e.g., Disjoint union: Cartesian product: · |q1 ∪ q2 | = |q1 | + |q2 | |q1 × q2 | = |q1 | · |q2 | Equi-join:  |q1 | · |q2 | if there are indexes on    max {|a| , |b| }     idx idx both join columns, ?  |q1 q2 | = |q1 | · |q2 | if there is only an index a=b       |a|idx on column a, ?  |q1 | · |q2 | · 1/10 otherwise  Our joins typically operate over computed relations. |c|idx : Number of unique values in index on column c. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 5
  8. 8. Abstract Domain Identifiers A simple form of data flow analysis provides the information needed. Introduce abstract domain identifiers α, β, . . . as placeholders for the active runtime domain for each column c. (Read c α as “column c contains values from domain α.”) Estimate the size α of each domain α, e.g.,2 dom a: b1 ,...,bn (q) ⊇ dom (q) ∪ aα ∧ α =! |q| . Identify inclusion relationships α β between domains, e.g., aα ∈ dom (q) ∧ aβ ∈ dom (σ··· (q)) ⇒ β α . 2 Operator a: b1 ,...,bn introduces a new key column (holding row numbers). August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 6
  9. 9. Abstract Domain Identifiers Use abstract domain information for cardinality estimation. E.g., “foreign key” join: aα ∈ dom (q1 ) bβ ∈ dom (q2 ) α β |q1 | · |q2 | |q1 q2 | = a=b β Domain inclusion guarantees that each tuple in q1 finds (at least one) join partner in q2 . Other examples: |q1 q2 | = |q1 | − |q2 | if q2 is a subset of q1 . |q1 q2 | = 0 if q1 is a subset of q2 . |q1 q2 | = |q1 | if q1 and q2 are disjoint. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 7
  10. 10. Interfacing with XPath—Projection Paths Track XPath navigation by means of projection paths3 a a b a b c c:child::*(b) 1 γa γb 1 γa b c d 2 γb 1 γa γc 1 γa γd 3 γd e 3 γd γe q1 q2 b⇒p ∈ path (q1 ) ⇒ c⇒p/child::* ∈ path (q2 ) Step operator makes XPath navigation explicit in relational plans (compiles to join on SQL back-ends). 3 A. Marian and J. Sim´ on. Projecting XML Documents. VLDB 2003. e August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 8
  11. 11. Interfacing with XPath—Cardinality Inference bα bα a a b a b c c:child::*(b) 1 γa γb 1 γa b c d 2 γb 1 γa γc α α 1 γa γd 3 γd e 3 γd γe q1 q2 Cardinality: fn:count (p/child::*) |q2 | = |q1 | · fn:count (p) = |q1 | · Prchild::* (p) fanout (here: 4/3) Domain Sizes: fn:count (p [ child::* ]) α = α · fn:count (p) = α · Pr[child::*] (p) selectivity (here: 2/3) Any XPath estimator that provides Prp2 (p1 ) and Pr[p2 ] (p1 ) will do. Our prototype uses a simple Data Guide-based implementation. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 9
  12. 12. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, · ∪ · ∪ iter1=iter item:"rain..." πiter,pos,item,ord:2 ∪· πiter,pos:1,ord:1, πiter1:iter, item:"no rain..." πiter,pos:1,ord:3, item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter δ item:descendant::ppcp(item) πiter σres πiter:inner,item res:(item,item1) iter1=iter πiter1:iter, πiter1:iter,pos,item pos,item πiter1:iter, pos1:1,item1:50 item pos: item iter πiter pos: item iter item:attribute::t(item) item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos inner: iter,pos 1 pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
  13. 13. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, · ∪ · ∪ iter1=iter item:"rain..." πiter,pos,item,ord:2 ∪· πiter,pos:1,ord:1, πiter1:iter, item:"no rain..." πiter,pos:1,ord:3, item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item pos: item iter πiter pos: item iter item:attribute::t(item) item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos inner: iter,pos 1 pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
  14. 14. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, · ∪ · ∪ iter1=iter item:"rain..." πiter,pos,item,ord:2 ∪· πiter,pos:1,ord:1, πiter1:iter, item:"no rain..." πiter,pos:1,ord:3, item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter iterα Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item iter α Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item Domain iter πiter pos: item inclusion: pos: item iter iterα ∈ dom item:attribute::t(item) ... (q) ; α α item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos Domain size (uses step selectivity): inner: iter,pos α = α 1· Pr[ax::nt] (· · · ) pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
  15. 15. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ π iter1α iter1=iter “Foreign key” iter,pos:1,ord:1, ∪· join: item:"rain..." ∪· iterα q1 q2 rain..." α item:"no = πiter,pos,item,ord:2 πiter,pos:1,ord:1, |q1 |·|q2 | (since α ∪· α) πiter1:iter, πiter,pos:1,ord:3, iter1=iter item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter iterα Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item iter α Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item Domain iter πiter pos: item inclusion: pos: item iter iterα ∈ dom item:attribute::t(item) ... (q) ; α α item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos Domain size (uses step selectivity): inner: iter,pos α = α 1· Pr[ax::nt] (· · · ) pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
  16. 16. πiter:outer,pos:pos1,item card2 : 4104 pos1: ord,pos outer Back to our Example Plan 2 iter=inner card3 : 3540 · ∪ 4 card4 : 564 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ π iter1α iter1=iter “Foreign key” iter,pos:1,ord:1, ∪· join: item:"rain..." ∪· iterα q1 q2 rain..." α item:"no = πiter,pos,item,ord:2 πiter,pos:1,ord:1, |q1 |·|q2 | (since α ∪· α) πiter1:iter, πiter,pos:1,ord:3, iter1=iter item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter iterα Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item iter α Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item Domain iter πiter pos: item inclusion: pos: item iter iterα ∈ dom item:attribute::t(item) ... (q) ; α α item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, card1 : 990 ord:pos Domain size (uses step selectivity): inner: iter,pos α = α 1· Pr[ax::nt] (· · · ) pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
  17. 17. Forecasting New Zealand’s Weather The obtained cardinalities can be mapped back to predict item counts for corresponding XQuery expressions:4 for $d in doc ("forecast.xml")/descendant::day card1 = 990/990 let $day := $d/@t let $ppcp := data ($d/descendant::ppcp) return if ($ppcp > 50) card2 = 4104/3402 then ("rain likely on", $day, card3 = 3540/2370 "chance of precipitation:", $ppcp) else ("no rain on", $day) card4 = 564/1032 Value statistics based on histograms ( paper) Inaccuracy is mainly due to correlations in the data. Rain in the morning likely means rain in the afternoon, too. 4 estimated/observed, based on data taken for New Zealand two weeks ago. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 11
  18. 18. More Realistic Queries: W3C XQuery Use Cases 0.01 0.1 1 10 Prototype implementation XMP Q4 based on Pathfinder XMP Q5 XMP Q6 For each subexpression: XMP Q8 XMP Q9 estimated cardinality XMP Q10 XMP Q11 observed cardinality SGML Q3 SGML Q4 Diameter indicates data SGML Q8a point “stacking” SGML Q8b R Q2 Plan root: filled circle R Q3 R Q10 R Q11 Applicable to “real” queries R Q13 R Q14 Recovery from intermediate R Q15 mis-estimations R Q17 e.g., existential semantics 0.01 0.1 1 10 estimated cardinality/observed cardinality August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 12
  19. 19. Wrap-Up Cardinality estimation framework for XQuery Subexpression-level estimates for arbitrary XQuery expressions Based on Pathfinder’s XQuery-to-relational algebra compiler relational cardinality estimation (System R) + existing work on XPath estimation + histograms for value predicates = cardinality estimation for XQuery High-quality estimates for realistic XQuery workloads Robust with respect to intermediate errors Pluggable and extensible e.g., XPath estimation subsystem, positional predicates August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 13
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×