Your SlideShare is downloading. ×
dbtech3.ppt
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

dbtech3.ppt

463
views

Published on


0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
463
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • From the above we can easily derive that sigma_{c_1} sigma_{c_2} R = sigma_{c_2} sigma_{c_1} R
  • The meaning of the arrows: rewritings versus axioms. What we write is axioms. The left to right direction is 99% the beneficial. However views complicate matters. Generalization to theta join Example 6.14
  • Watch out rules like that: you cannot apply them blindly. May lead to an explosion of alternative expressions. Use in the exam the rules of duplicates.
  • 1
  • 1
  • 1
  • Transcript

    • 1. Database Techniek Query Optimization (chapter 14)
    • 2. Lecture 3
      • Query Rewriting
        • Equivalence Rules
      • Query Optimization
        • Dynamic Programming (System R/DB2)
        • Heuristics (Ingres/Postgres)
      • De-correlation of nested queries
      • Result Size Estimation
        • Practicum Assignment 2
    • 3. Lecture 3
      • Query Rewriting
        • Equivalence Rules
      • Query Optimization
        • Dynamic Programming (System R/DB2)
        • Heuristics (Ingres/Postgres)
      • De-correlation of nested queries
      • Result Size Estimation
        • Practicum Assignment 2
    • 4. Transformation of Relational Expressions
      • Two relational algebra expressions are said to be equivalent if on every legal database instance the two expressions generate the same set of tuples
        • Note: order of tuples is irrelevant
      • In SQL, inputs and outputs are bags (multi-sets) of tuples
        • Two expressions in the bag version of the relational algebra are said to be equivalent if on every legal database instance the two expressions generate the same bag of tuples
      • An equivalence rule says that expressions of two forms are equivalent
        • Can replace expression of first form by second, or vice versa
    • 5. Equivalence Rules
      • 1. Conjunctive selection operations can be deconstructed into a sequence of individual selections.
      • 2. Selection operations are commutative.
      • 3. Only the last in a sequence of projection operations is needed, the others can be omitted.
      • Selections can be combined with Cartesian products and theta joins.
        •   (E 1 X E 2 ) = E 1  E 2
        •   1 (E 1  2 E 2 ) = E 1  1  2 E 2
    • 6. Algebraic Rewritings for Selection:  cond2  cond1 R  cond1 AND cond2 R  cond1 OR cond2 R   cond2 R  cond1  cond1  cond2 R
    • 7. Equivalence Rules (Cont.)
        • 5. Theta-join operations (and natural joins) are commutative . E 1  E 2 = E 2  E 1
        • 6. Natural join operations are associative :
        • ( E 1 E 2 ) E 3 = E 1 ( E 2 E 3 )
    • 8. Equivalence Rules for Joins commutative associative
    • 9.
      • For pushing down selections into a (theta) join we have the following cases:
        • (push 1) When all the attributes in  0 involve only the attributes of one of the expressions ( E 1 ) being joined.   0  E 1  E 2 ) = (   0 (E 1 ))  E 2
        • (split) When  1 involves only the attributes of E 1 and  2 involves only the attributes of E 2 .
      •   1    E 1  E 2 ) = (   1 (E 1 ))  (   (E 2 ))
        • (impossible) When  involves both attributes of E 1 and E 2 (it is a join condition)
      Equivalence Rules (Cont.)
    • 10. Pushing Selection thru Cartesian Product and Join  R S cond   cond S R  The right direction requires that cond refers to S attributes only  R S cond  cond S R
    • 11. Projection Decomposition  p XY p X p Y p   Y X  Y X  p   p X p  p Y p   X.tax Y.price total=X.tax*Y.price p  {total} {tax} {price} p XY p X p Y
    • 12.
      • 8. The projections operation distributes over the theta join operation as follows:
      • (a) if  involves only attributes from L 1  L 2 :
      • (b) Consider a join E 1  E 2 .
        • Let L 1 and L 2 be sets of attributes from E 1 and E 2 , respectively.
        • Let L 3 be attributes of E 1 that are involved in join condition  , but are not in L 1  L 2 , and
        • Let L 4 be attributes of E 2 that are involved in join condition  , but are not in L 1  L 2 .
      More Equivalence Rules
    • 13. Join Ordering Example
      • For all relations r 1, r 2, and r 3 ,
      • ( r 1 r 2 ) r 3 = r 1 ( r 2 r 3 )
      • If r 2 r 3 is quite large and r 1 r 2 is small, we choose
      • ( r 1 r 2 ) r 3
      • so that we compute and store a smaller temporary relation.
    • 14. Join Ordering Example (Cont.)
      • Consider the expression
      •  customer-name ((  branch-city = “Brooklyn” ( branch)) account depositor)
      • Could compute account depositor first, and join result with  branch-city = “Brooklyn” ( branch) but account depositor is likely to be a large relation.
      • Since it is more likely that only a small fraction of the bank’s customers have accounts in branches located in Brooklyn, it is better to compute
      •  branch-city = “Brooklyn” ( branch) account
      • first.
    • 15. Lecture 3
      • Query Rewriting
        • Equivalence Rules
      • Query Optimization
        • Dynamic Programming (System R/DB2)
        • Heuristics (Ingres/Postgres)
      • De-correlation of nested queries
      • Result Size Estimation
    • 16. Lecture 3
      • Query Rewriting
        • Equivalence Rules
      • Query Optimization
        • Dynamic Programming (System R/DB2)
        • Heuristics (Ingres/Postgres)
      • De-correlation of nested queries
      • Result Size Estimation
    • 17. The role of Query Optimization SQL physical algebra logical algebra parsing, normalization logical query optimization physical query optimization query execution
    • 18. The role of Query Optimization SQL physical algebra logical algebra parsing, normalization logical query optimization physical query optimization query execution Compare different relational algebra plan  on result size (Practicum 2A)
    • 19. The role of Query Optimization SQL physical algebra logical algebra parsing, normalization logical query optimization physical query optimization query execution
      • Compare different execution algorithms
      • on true cost
      • (IO, CPU, cache)
    • 20. Enumeration of Equivalent Expressions
      • Query optimizers use equivalence rules to systematically generate expressions equivalent to the given expression
      • repeated until no more expressions can be found:
        • for each expression found so far, use all applicable equivalence rules, and add newly generated expressions to the set of expressions found so far
      • The above approach is very expensive in space and time
      • Time and space requirements are reduced by not generating all expressions
    • 21. Finding A Good Join Order
      • Consider finding the best join-order for r 1 r 2 . . . r n .
      • There are (2( n – 1))!/( n – 1)! different join orders for above expression. With n = 7, the number is 665280, with n = 10, the number is greater than 176 billion!
      • No need to generate all the join orders. Using dynamic programming, the least-cost join order for any subset of { r 1 , r 2 , . . . r n } is computed only once and stored for future use.
    • 22. Dynamic Programming in Optimization
      • To find best join tree for a set of n relations:
        • To find best plan for a set S of n relations, consider all possible plans of the form: S 1 ( S – S 1 ) where S 1 is any non-empty subset of S .
        • Recursively compute costs for joining subsets of S to find the cost of each plan. Choose the cheapest of the 2 n – 1 alternatives.
        • When plan for any subset is computed, store it and reuse it when it is required again, instead of recomputing it
          • Dynamic programming
    • 23. Join Order Optimization Algorithm
      • procedure findbestplan( S ) if ( bestplan [ S ]. cost   ) return bestplan [ S ] // else bestplan [ S ] has not been computed earlier, compute it now for each non-empty subset S 1 of S such that S 1  S P1= findbestplan( S 1) P2= findbestplan( S - S 1) A = best algorithm for joining results of P 1 and P 2 cost = P 1. cost + P 2. cost + cost of A if cost < bestplan [ S ]. cost bestplan [ S ]. cost = cost bestplan [ S ]. plan = “execute P 1. plan ; execute P 2. plan ; join results of P 1 and P 2 using A ” return bestplan [ S ]
    • 24. Left Deep Join Trees
      • In left-deep join trees, the right-hand-side input for each join is a relation, not the result of an intermediate join.
    • 25. Cost of Optimization
      • With dynamic programming time complexity of optimization with bushy trees is O (3 n ).
        • With n = 10, this number is 59000 instead of 176 billion!
      • Space complexity is O (2 n )
      • To find best left-deep join tree for a set of n relations:
        • Consider n alternatives with one relation as right-hand side input and the other relations as left-hand side input.
        • Using (recursively computed and stored) least-cost join order for each alternative on left-hand-side, choose the cheapest of the n alternatives.
      • If only left-deep trees are considered, time complexity of finding best join order is O ( n 2 n )
        • Space complexity remains at O (2 n )
      • Cost-based optimization is expensive, but worthwhile for queries on large datasets (typical queries have small n, generally < 10)
    • 26. Physical Query Optimization
      • Minimizes absolute cost
        • Minimize I/Os
        • Minimize CPU, cache miss cost (main memory DBMS)
      • Must consider the interaction of evaluation techniques when choosing evaluation plans: choosing the cheapest algorithm for each operation independently may not yield best overall algorithm. E.g.
        • merge-join may be costlier than hash-join, but may provide a sorted output which reduces the cost for an outer level aggregation.
        • nested-loop join may provide opportunity for pipelining
    • 27. Physical Optimization: Interesting Orders
      • Consider the expression ( r 1 r 2 r 3 ) r 4 r 5
      • An interesting sort order is a particular sort order of tuples that could be useful for a later operation.
        • Generating the result of r 1 r 2 r 3 sorted on the attributes common with r 4 or r 5 may be useful, but generating it sorted on the attributes common only r 1 and r 2 is not useful.
        • Using merge-join to compute r 1 r 2 r 3 may be costlier, but may provide an output sorted in an interesting order.
      • Not sufficient to find the best join order for each subset of the set of n given relations; must find the best join order for each subset, for each interesting sort order
        • Simple extension of earlier dynamic programming algorithms
        • Usually, number of interesting orders is quite small and doesn’t affect time/space complexity significantly
    • 28. Heuristic Optimization
      • Cost-based optimization is expensive, even with dynamic programming.
      • Systems may use heuristics to reduce the number of choices that must be made in a cost-based fashion.
      • Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance:
        • Perform selection early (reduces the number of tuples)
        • Perform projection early (reduces the number of attributes)
        • Perform most restrictive selection and join operations before other similar operations.
        • Some systems use only heuristics, others combine heuristics with partial cost-based optimization.
    • 29. Steps in Typical Heuristic Optimization
      • 1. Deconstruct conjunctive selections into a sequence of single selection operations (Equiv. rule 1.).
      • 2. Move selection operations down the query tree for the earliest possible execution (Equiv. rules 2, 7a, 7b, 11).
      • 3. Execute first those selection and join operations that will produce the smallest relations (Equiv. rule 6).
      • 4. Replace Cartesian product operations that are followed by a selection condition by join operations (Equiv. rule 4a).
      • 5. Deconstruct and move as far down the tree as possible lists of projection attributes, creating new projections where needed (Equiv. rules 3, 8a, 8b, 12).
      • 6. Identify those subtrees whose operations can be pipelined, and execute them using pipelining).
    • 30. Heuristic Join Order: the Wong-Youssefi algorithm (INGRES) Sample TPC-H Schema Nation(NationKey, NName) Customer(CustKey, CName, NationKey) Order(OrderKey, CustKey, Status) Lineitem(OrderKey, PartKey, Quantity) Product(SuppKey, PartKey, PName) Supplier(SuppKey, SName) SELECT SName FROM Nation, Customer, Order, LineItem, Product, Supplier WHERE Nation.NationKey = Cuctomer.NationKey AND Customer.CustKey = Order.CustKey AND Order.OrderKey=LineItem.OrderKey AND LineItem.PartKey= Product.Partkey AND Product.Suppkey = Supplier.SuppKey AND NName = “Canada” Find the names of suppliers that sell a product that appears in a line item of an order made by a customer who is in Canada
    • 31. Challenges with Large Natural Join Expressions
      • For simplicity, assume that in the query
      • All joins are natural
      • whenever two tables of the FROM clause have common
      • attributes we join on them
      • Consider Right-Index only
      Nation Customer Order LineItem Product Supplier σ NName=“Canada” π SName One possible order RI RI RI RI RI Index
    • 32. Wong-Yussefi algorithm assumptions and objectives
      • Assumption 1 (weak): Indexes on all join attributes (keys and foreign keys)
      • Assumption 2 (strong): At least one selection creates a small relation
        • A join with a small relation results in a small relation
      • Objective: Create sequence of index-based joins such that all intermediate results are small
    • 33. Hypergraphs CName CustKey NationKey NName Status OrderKey Quantity PartKey SuppKey PName SName
      • relation hyperedges
        • two hyperedges for same relation are possible
      • each node is an attribute
      • can extend for non-natural equality joins by merging nodes
      Nation Customer Order LineItem Product Supplier
    • 34. Small Relations/Hypergraph Reduction CName CustKey NationKey NName Status OrderKey Quantity PartKey SuppKey PName SName Nation Customer Order LineItem Product Supplier NationKey NName “ Nation” is small because it has the equality selection NName = “Canada” Nation σ NName=“Canada” Index Pick a small relation (and its conditions) to start the plan
    • 35. CName CustKey NationKey NName Status OrderKey Quantity PartKey SuppKey PName SName Nation Customer Order LineItem Product Supplier NationKey NName Nation σ NName=“Canada” Index RI (1) Remove small relation (hypergraph reduction) and color as “small” any relation that joins with the removed “small” relation Customer (2) Pick a small relation (and its conditions if any) and join it with the small relation that has been reduced
    • 36. After a bunch of steps… Nation Customer Order LineItem Product Supplier σ NName=“Canada” π SName RI RI RI RI RI Index
    • 37. Some Query Optimizers
      • The System R/Starburst: dynamic programming on left-deep join orders. Also uses heuristics to push selections and projections down the query tree.
      • DB2, SQLserver are cost-based optimizers
        • SQLserver is transformation based, also uses dynamic programming.
      • MySQL optimizer is heuristics-based (rather weak)
      • Heuristic optimization used in some versions of Oracle:
        • Repeatedly pick “best” relation to join next
          • Starting from each of n starting points. Pick best among these.
    • 38. Lecture 3
      • Query Rewriting
        • Equivalence Rules
      • Query Optimization
        • Dynamic Programming (System R/DB2)
        • Heuristics (Ingres/Postgres)
      • De-correlation of nested queries
      • Result Size Estimation
        • Practicum Assignment 2
    • 39. Lecture 3
      • Query Rewriting
        • Equivalence Rules
      • Query Optimization
        • Dynamic Programming (System R/DB2)
        • Heuristics (Ingres/Postgres)
      • De-correlation of nested queries
      • Result Size Estimation
        • Practicum Assignment 2
    • 40. Optimizing Nested Subqueries
      • SQL conceptually treats nested subqueries in the where clause as functions that take parameters and return a single value or set of values
        • Parameters are variables from outer level query that are used in the nested subquery; such variables are called correlation variables
      • E.g. select customer-name from borrower where exists ( select * from depositor where depositor.customer-name = borrower.customer-name )
      • Conceptually, nested subquery is executed once for each tuple in the cross-product generated by the outer level from clause
        • Such evaluation is called correlated evaluation
        • Note: other conditions in where clause may be used to compute a join (instead of a cross-product) before executing the nested subquery
    • 41. Optimizing Nested Subqueries (Cont.)
      • Correlated evaluation may be quite inefficient since
        • a large number of calls may be made to the nested query
        • there may be unnecessary random I/O as a result
      • SQL optimizers attempt to transform nested subqueries to joins where possible, enabling use of efficient join techniques
      • E.g.: earlier nested query can be rewritten as select customer-name from borrower, depositor where depositor.customer-name = borrower.customer-name
        • Note: above query doesn’t correctly deal with duplicates, can be modified to do so as we will see
      • In general, it is not possible/straightforward to move the entire nested subquery from clause into the outer level query from clause
        • A temporary relation is created instead, and used in body of outer level query
    • 42. Optimizing Nested Subqueries (Cont.)
      • In general, SQL queries of the form below can be rewritten as shown
      • Rewrite: select … from L 1 where P 1 and exists ( select * from L 2 where P 2 )
      • To: create table t 1 as select distinct V from L 2 where P 2 1 select … from L 1 , t 1 where P 1 and P 2 2
        • P 2 1 contains predicates in P 2 that do not involve any correlation variables
        • P 2 2 reintroduces predicates involving correlation variables, with relations renamed appropriately
        • V contains all attributes used in predicates with correlation variables
    • 43. Optimizing Nested Subqueries (Cont.)
      • In our example, the original nested query would be transformed to create table t 1 as select distinct customer-name from depositor select customer-name from borrower , t 1 where t 1 . customer-name = borrower.customer-name
      • The process of replacing a nested query by a query with a join (possibly with a temporary relation) is called decorrelation .
      • Decorrelation is more complicated when
        • the nested subquery uses aggregation, or
        • when the result of the nested subquery is used to test for equality, or
        • when the condition linking the nested subquery to the other query is not exists ,
        • and so on.
    • 44. Practicum Assignment 2A
      • Get the XML metadata description for TPC-H
        • xslt script for plotting histograms
      • Take our solution for your second query (assignment 1)
      • For each operator in the tree give:
        • Selectivity
        • Intermediate Result size
        • Short description how you computed this
        • Explanation how to compute histograms on all result columns
      • Sum all intermediate result sizes into total query cost
      • DEADLINE: march 31
    • 45. The Big Picture
      • 1. Parsing and translation
      • 2. Optimization
      • 3. Evaluation
    • 46. The Big Picture
      • 1. Parsing and translation
      • 2. Optimization
      • 3. Evaluation
    • 47. Optimization
      • Query Optimization : Amongst all equivalent evaluation plans choose the one with lowest cost.
        • Cost is estimated using statistical information from the database catalog
          • e.g. number of tuples in each relation, size of tuples, etc.
      • In this lecture we study logical cost estimation
        • introduction to histograms
        • estimating the amount of tuples in the result with perfect and equi-height histograms
        • propagation of histograms into result columns
        • How to compute result size from width and #tuples
    • 48. Cost Estimation
      • Physical cost estimation
        • predict I/O blocks, seeks, cache misses, RAM consumption, …
        • Depends in the execution algorithm
      • In this lecture we study logical cost estimation
        • “ the plan with smallest intermediate result tends to be best”
        • need estimations for intermediate result sizes
      • Histogram-based estimation (practicum, assignment 2)
        • estimating the amount of tuples in the result with perfect and equi-height histograms
        • propagation of histograms into result columns
        • compute result size as tuple-width * #tuples
    • 49. Selectivities
      • select
        • expr := X (col,const): X in { <, <=, =, >=, > }
        • | expr && expr | expr || expr
        • s op (R) = |R'| / |R| .
      • join
        • only 1-n / n-1 foreign key joins
        • s join (R 1 ,R 2 ) = |R'| / (|R 1 |*|R 2 |) .
      • aggr
        • s(A(R;g 1 ;a)) = |A(R;g 1 ;a)| / |R| = distinct(R.g 1 ) / |R| .
        • s(A(R;g 1 ,g 2 ;a)) = (distinct(R.g 1 ) * distinct(R.g 2 )) / |R| .
      • project, order
        • s project (R) = s order (R) = 1
      • topn
        • s topN (R) = min(N,|R|) / |R| = min(N/|R|,1) .
    • 50. Result Size
      • #tuples_max * selectivity * #columns
        • We disregard differences in column-width
      • project:
        • #columns = |projectlist|
        • #tuples_max = |R|
      • aggr:
        • #columns = |groupbys| + |aggrs|
        • #tuples_max = min(|R|, |g1| * .. * |gn|)
      • join:
        • #columns = |child1| + |child2|
        • #tuples_max = |R1| * |R2|
      • other :
        • #columns stays equal wrt child
        • #tuples_max = |R|
    • 51. Selectivity estimation
      • We can estimate the selectivities using:
      • domain constraints
      • min/max statistics
      • histograms
    • 52. Histograms
      • Buckets:
      • B = <min, max, total, distinct>
      • Leave min out ( B i .min = B i-1 .max )
    • 53. Different Kinds of Histograms
      • Perfect
      • Equi-width
      • Equi-height
      • In the practicum we use
      • Perfect histograms, when distinct(R.a) < 25
      • Equi-height histograms of 10 buckets, otherwise
        • Not perfectly even-height: disjunct value ranges between buckets
        • (i.e. frequent value is not split over even-height buckets. It may create a bigger-than-height bucket)
    • 54. Perfect Histograms: Equi-Selection
      • s(R.a=C) = B k .total * (1/|R|)
          • in case there is a k with B k .max = C
      • s(R.a=C) = 0
        • otherwise
      a c d f total s(R.a=d)
    • 55. Perfect Histograms: Equi-Selection
      • s(R.a=C) = B k .total * (1/|R|)
          • in case there is a k with B k .max = C
      • s(R.a=C) = 0
        • otherwise
      a c d f total s(R.a=d)
    • 56. Perfect Histograms: Range-Selection
        • s(R.a<C) = sum(B i .total) * (1/|R|),
            • for all 1 <= i < k with B (k-1) .max < C <= B k .max
      a c d f total s(R.a<d)
    • 57. Perfect Histograms: Range-Selection
        • s(R.a<C) = sum(B i .total) * (1/|R|),
            • for all 1 <= i < k with B (k-1) .max < C <= B k .max
      a c d f total s(R.a<d)
    • 58. Equi-Height Histograms: Equi-Selection
        • s(R.a=C) = avg_freq(B k ) * (1/|R|)
            • in case there is a k with B (k-1) .max < C <= B k .max
            • avg_freq(B k ) = B k .total / B k .distinct
      • s(R.a=C) = 0
        • otherwise
      a d e f total s(R.a=c)
    • 59. Equi-Height Histograms: Equi-Selection
        • s(R.a=C) = avg_freq(B k ) * (1/|R|)
            • in case there is a k with B (k-1) .max < C <= B k .max
            • avg_freq(B k ) = B k .total / B k .distinct
      • s(R.a=C) = 0
            • otherwise
      a d e f total s(R.a=c)
    • 60. Equi-Height Histograms: Range-Selection
          • s(R.a<C) = ( sum(B i .total) + freq_lt(B k ,C) ) * (1/|R|),
            • for all 1 <= i < k with B (k-1) .max < C <= B k .max.
      total s(R.a<c) a d e f
    • 61. Equi-Height Histograms: Range-Selection
          • s(R.a<C) = ( sum(B i .total) + freq_lt(B k ,C) ) * (1/|R|),
            • for all 1 <= i < k with B (k-1) .max < C <= B k .max.
      total s(R.a<c) a d e f
    • 62. Select with And and Or
      • Assume no correlation between attributes:
      • s(θ a and θ c ) = s(θ a ) * s(θ c )
      • s(θ a or θ c ) = s(θ a ) + (1-s(θ a )) * s(θ c )
      • Note: must normalize θ a , θ c into non-overlapping conditions
    • 63. Foreign-key Join Selectivity/Hitrate Estimation
      • Foreign-key constraint:
      • R 1 matches at most once with R 2
      • “ each order matches on average with 7 lineitems”  hitrate = 7
      • But what if R’ 2 (e.g. order) is an intermediate result?
      • R’ 2 may have multiple key occurrences due to a previous join
      • R’ 2 may have less key occurrences (missing keys) due to a select (or join).
      • Simple Approach (practicum):
      • Hitrate *= | R’ 2 |/| R 2 |
    • 64. Aggr(R,[g1..gn],[..])
      • Can only predict groupby columns and size:
      • Expected result size =
        • min(|R|,distinct(g 1 ) * …. * distinct(g n ))
    • 65. Histogram Propagation
      • order: histogram stays identical
      • project: histogram stays identical
              • Expression (e.g. l_tax*l_price) not required for the practicum
              • possible to use cartesian product on histograms, followed by expression evaluation and re-bucketizing.
      • topn: not required for the practicum
              • Use last bucket (and backwards) to take highest N distinct values and their frequencies
      • aggr: not required for the practicum
              • Groupbys: distinct is multiplication of distincts, freq=1
              • Aggregates: only possible for global aggregates (no groupbys)
      • fk-join: multiply totals by join hitrate
              • distinct = min(distinct,total)  this is a simplicifation!
      • Select: multiply totals by selectivity
              • distinct = min(distinct,total)
      • Select (selection attribute):
              • Get totals/distincts from subset of buckets
    • 66. Practicum Assignment 2
      • Get the XML metadata description for TPC-H
        • ps/pfd histograms also available
      • Take our solution for your second query (assignment 1)
      • For each operator in the tree give:
        • Selectivity
        • Intermediate Result size
        • Short description how you computed this
        • Explanation how to compute histograms on all result columns
      • Sum all intermediate result sizes into total query cost
      • DEADLINE: march 31