Less than ideal cardinality estimates lead to suboptimal query plans. To address this issue, this paper proposed a tighter bound for intermediate join cardinalities based on the concept of hypergraph.
2. PingCAP.com
Introduction & Background
● How to define a join order.
○ Binary join reorder algorithm. ------ need a good count estimation model.
○ Intermediate join count estimation. ------ need a precise cardinality calculation.
○ Cardinality for a join plan. ------ how to estimate it?
○ Naive solution:
!"#$ %&'(#$&)#*+ = max()12* 31+ %&'(#$&)#*+, '#5ℎ* 31+ %&'(#$&)#*+)
● What results are cardinality underestimation caused?
○ The count of join are overestimated.
○ Choose a very slow join order.
3. PingCAP.com
Introduction & Background
● What does paper do?
○ An approach to tightening join cardinality upper bounds.
○ A method of generating entropic bounding formulas.
○ A partition budgeting strategy to control space and time complexity.
○ Demonstrate approach in multiple datasets.
● Pre-knowledge: hypergraph
10. PingCAP.com
Approach
● How to decide Hash Partition Budgeting
○ fixed size for each attribute?
○ let sum size of attributes fixed?
● Hash Partition Budgeting
○ fixed max size ! for each attribute.
○ distribute to attributes covered unconditionally by a relation.
11. PingCAP.com
Approach
● Bound Formula Generation
○ enumerate all relation cover set in hypergraph
○ for each relation in a set, calculate its contribution for formula
● Example