Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dependable Cardinality Forecast for XQuery

609 views

Published on

Published in: Education, Technology, Business
  • Be the first to comment

  • Be the first to like this

Dependable Cardinality Forecast for XQuery

  1. 1. Dependable Cardinality Forecasts for XQuery Jens Teubner, ETH (formerly IBM Research) Torsten Grust, U T¨bingen (formerly TUM) u Sebastian Maneth, UNSW and NICTA Sherif Sakr, UNSW and NICTA c Systems Group — Department of Computer Science — ETH Z¨rich u August 26, 2008
  2. 2. Cardinality Estimation for XQuery The feature-richness and semantics of the language make cardinality estimation for XQuery notoriously hard. for $d in doc ("forecast.xml")/descendant::day let $day := $d/@t let $ppcp := data ($d/descendant::ppcp) return if ($ppcp > 50) then ("rain likely on", $day, "chance of precipitation:", $ppcp) else ("no rain on", $day) for iteration sequence construction conditionals (if-then-else) ... existential quantification (XPath is not a focus of this work.) August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 2
  3. 3. Cardinality Estimation for XQuery The feature-richness and semantics of the language make cardinality estimation for XQuery notoriously hard. for $d in doc ("forecast.xml")/descendant::day card1 = ? let $day := $d/@t let $ppcp := data ($d/descendant::ppcp) return if ($ppcp > 50) card2 = ? then ("rain likely on", $day, card3 = ? "chance of precipitation:", $ppcp) else ("no rain on", $day) card4 = ? for iteration sequence construction conditionals (if-then-else) ... existential quantification (XPath is not a focus of this work.) → Goal: Compute subexpression-level cardinalities cardi . August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 2
  4. 4. Idea: Perform cardinality estimation on relational plan equivalents for XQuery. relational cardinality estimation (System R) + existing work on XPath estimation + histograms for value predicates = cardinality estimation for XQuery Build on Pathfinder’s XQuery-to-relational algebra compiler.1 → tuple count ≡ XQuery item count Cardinality information for each subexpression. 1 http://www.pathfinder-xquery.org/ August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 3
  5. 5. πiter:outer,pos:pos1,item pos1: ord,pos outer Example (Plan details not of interest today.) 2 iter=inner · ∪ 4 1 3 πiter,pos:pos1,item for $d in doc ("forecast.xml")/descendant::day πiter,pos:pos1,item pos1: ord,pos iter let $day := $d/@t pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, let $ppcp := data ($d/descendant::ppcp) · ∪ item:"rain..." · ∪ return 2 πiter,pos:1,ord:1, πiter,pos,item,ord:2 ∪· if ($ppcp > 50) 3 item:"no rain..." πiter,pos:1,ord:3, item:"chance..." then ("rain likely on", $day, πiter,pos,item,ord:2 πiter,pos,item,ord:4 "chance of precipitation:", $ppcp) iter1=iter iter1=iter iter=iter1 else ("no rain on", $day) 4 δ πiter σres res:(item,item1) Pathfinder compiles XQuery iter1=iter πiter1:iter, πiter1:iter,pos,item with arbitrary nesting. πiter1:iter, pos1:1,item1:50 item pos,item pos: item iter Maintain correspondence to πiter pos: item iter item:attribute::t(item) item:descendant::ppcp(item) original query if back-end is πiter:inner,item πinner,outer:iter, ord:pos not relational. inner: iter,pos 1 pos: item iter Derive estimates based on an item:descendant::day(item) inference rule set. docitem πiter,item:"forecast.xml" iter 1
  6. 6. Relational XQuery Cardinality Estimation Apply System R-style estimation to relational XQuery plans, e.g., Disjoint union: Cartesian product: · |q1 ∪ q2 | = |q1 | + |q2 | |q1 × q2 | = |q1 | · |q2 | Equi-join:    |q1 | · |q2 | if there are indexes on  max {|a| , |b| } both join columns,     idx idx  |q1 q2 | = |q1 | · |q2 | if there is only an index a=b |a|idx on column a,        |q1 | · |q2 | · 1/10 otherwise  |c|idx : Number of unique values in index on column c. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 5
  7. 7. Relational XQuery Cardinality Estimation Apply System R-style estimation to relational XQuery plans, e.g., Disjoint union: Cartesian product: · |q1 ∪ q2 | = |q1 | + |q2 | |q1 × q2 | = |q1 | · |q2 | Equi-join:  |q1 | · |q2 | if there are indexes on    max {|a| , |b| }     idx idx both join columns, ?  |q1 q2 | = |q1 | · |q2 | if there is only an index a=b       |a|idx on column a, ?  |q1 | · |q2 | · 1/10 otherwise  Our joins typically operate over computed relations. |c|idx : Number of unique values in index on column c. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 5
  8. 8. Abstract Domain Identifiers A simple form of data flow analysis provides the information needed. Introduce abstract domain identifiers α, β, . . . as placeholders for the active runtime domain for each column c. (Read c α as “column c contains values from domain α.”) Estimate the size α of each domain α, e.g.,2 dom a: b1 ,...,bn (q) ⊇ dom (q) ∪ aα ∧ α =! |q| . Identify inclusion relationships α β between domains, e.g., aα ∈ dom (q) ∧ aβ ∈ dom (σ··· (q)) ⇒ β α . 2 Operator a: b1 ,...,bn introduces a new key column (holding row numbers). August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 6
  9. 9. Abstract Domain Identifiers Use abstract domain information for cardinality estimation. E.g., “foreign key” join: aα ∈ dom (q1 ) bβ ∈ dom (q2 ) α β |q1 | · |q2 | |q1 q2 | = a=b β Domain inclusion guarantees that each tuple in q1 finds (at least one) join partner in q2 . Other examples: |q1 q2 | = |q1 | − |q2 | if q2 is a subset of q1 . |q1 q2 | = 0 if q1 is a subset of q2 . |q1 q2 | = |q1 | if q1 and q2 are disjoint. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 7
  10. 10. Interfacing with XPath—Projection Paths Track XPath navigation by means of projection paths3 a a b a b c c:child::*(b) 1 γa γb 1 γa b c d 2 γb 1 γa γc 1 γa γd 3 γd e 3 γd γe q1 q2 b⇒p ∈ path (q1 ) ⇒ c⇒p/child::* ∈ path (q2 ) Step operator makes XPath navigation explicit in relational plans (compiles to join on SQL back-ends). 3 A. Marian and J. Sim´ on. Projecting XML Documents. VLDB 2003. e August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 8
  11. 11. Interfacing with XPath—Cardinality Inference bα bα a a b a b c c:child::*(b) 1 γa γb 1 γa b c d 2 γb 1 γa γc α α 1 γa γd 3 γd e 3 γd γe q1 q2 Cardinality: fn:count (p/child::*) |q2 | = |q1 | · fn:count (p) = |q1 | · Prchild::* (p) fanout (here: 4/3) Domain Sizes: fn:count (p [ child::* ]) α = α · fn:count (p) = α · Pr[child::*] (p) selectivity (here: 2/3) Any XPath estimator that provides Prp2 (p1 ) and Pr[p2 ] (p1 ) will do. Our prototype uses a simple Data Guide-based implementation. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 9
  12. 12. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, · ∪ · ∪ iter1=iter item:"rain..." πiter,pos,item,ord:2 ∪· πiter,pos:1,ord:1, πiter1:iter, item:"no rain..." πiter,pos:1,ord:3, item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter δ item:descendant::ppcp(item) πiter σres πiter:inner,item res:(item,item1) iter1=iter πiter1:iter, πiter1:iter,pos,item pos,item πiter1:iter, pos1:1,item1:50 item pos: item iter πiter pos: item iter item:attribute::t(item) item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos inner: iter,pos 1 pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
  13. 13. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, · ∪ · ∪ iter1=iter item:"rain..." πiter,pos,item,ord:2 ∪· πiter,pos:1,ord:1, πiter1:iter, item:"no rain..." πiter,pos:1,ord:3, item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item pos: item iter πiter pos: item iter item:attribute::t(item) item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos inner: iter,pos 1 pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
  14. 14. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, · ∪ · ∪ iter1=iter item:"rain..." πiter,pos,item,ord:2 ∪· πiter,pos:1,ord:1, πiter1:iter, item:"no rain..." πiter,pos:1,ord:3, item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter iterα Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item iter α Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item Domain iter πiter pos: item inclusion: pos: item iter iterα ∈ dom item:attribute::t(item) ... (q) ; α α item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos Domain size (uses step selectivity): inner: iter,pos α = α 1· Pr[ax::nt] (· · · ) pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
  15. 15. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ π iter1α iter1=iter “Foreign key” iter,pos:1,ord:1, ∪· join: item:"rain..." ∪· iterα q1 q2 rain..." α item:"no = πiter,pos,item,ord:2 πiter,pos:1,ord:1, |q1 |·|q2 | (since α ∪· α) πiter1:iter, πiter,pos:1,ord:3, iter1=iter item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter iterα Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item iter α Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item Domain iter πiter pos: item inclusion: pos: item iter iterα ∈ dom item:attribute::t(item) ... (q) ; α α item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos Domain size (uses step selectivity): inner: iter,pos α = α 1· Pr[ax::nt] (· · · ) pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
  16. 16. πiter:outer,pos:pos1,item card2 : 4104 pos1: ord,pos outer Back to our Example Plan 2 iter=inner card3 : 3540 · ∪ 4 card4 : 564 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ π iter1α iter1=iter “Foreign key” iter,pos:1,ord:1, ∪· join: item:"rain..." ∪· iterα q1 q2 rain..." α item:"no = πiter,pos,item,ord:2 πiter,pos:1,ord:1, |q1 |·|q2 | (since α ∪· α) πiter1:iter, πiter,pos:1,ord:3, iter1=iter item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter iterα Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item iter α Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item Domain iter πiter pos: item inclusion: pos: item iter iterα ∈ dom item:attribute::t(item) ... (q) ; α α item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, card1 : 990 ord:pos Domain size (uses step selectivity): inner: iter,pos α = α 1· Pr[ax::nt] (· · · ) pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
  17. 17. Forecasting New Zealand’s Weather The obtained cardinalities can be mapped back to predict item counts for corresponding XQuery expressions:4 for $d in doc ("forecast.xml")/descendant::day card1 = 990/990 let $day := $d/@t let $ppcp := data ($d/descendant::ppcp) return if ($ppcp > 50) card2 = 4104/3402 then ("rain likely on", $day, card3 = 3540/2370 "chance of precipitation:", $ppcp) else ("no rain on", $day) card4 = 564/1032 Value statistics based on histograms ( paper) Inaccuracy is mainly due to correlations in the data. Rain in the morning likely means rain in the afternoon, too. 4 estimated/observed, based on data taken for New Zealand two weeks ago. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 11
  18. 18. More Realistic Queries: W3C XQuery Use Cases 0.01 0.1 1 10 Prototype implementation XMP Q4 based on Pathfinder XMP Q5 XMP Q6 For each subexpression: XMP Q8 XMP Q9 estimated cardinality XMP Q10 XMP Q11 observed cardinality SGML Q3 SGML Q4 Diameter indicates data SGML Q8a point “stacking” SGML Q8b R Q2 Plan root: filled circle R Q3 R Q10 R Q11 Applicable to “real” queries R Q13 R Q14 Recovery from intermediate R Q15 mis-estimations R Q17 e.g., existential semantics 0.01 0.1 1 10 estimated cardinality/observed cardinality August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 12
  19. 19. Wrap-Up Cardinality estimation framework for XQuery Subexpression-level estimates for arbitrary XQuery expressions Based on Pathfinder’s XQuery-to-relational algebra compiler relational cardinality estimation (System R) + existing work on XPath estimation + histograms for value predicates = cardinality estimation for XQuery High-quality estimates for realistic XQuery workloads Robust with respect to intermediate errors Pluggable and extensible e.g., XPath estimation subsystem, positional predicates August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 13

×