Robust Access Path Selection without
Cardinality Estimation
Presented by Kenan Yao
PingCAP.com
Problem to Solve
● access path selection
● classical approach
○ predicate push down
○ ranger extracts access conditions
○ selectivity estimation / statistics
○ cost model
○ independent / uniform assumptions
Transaction
PingCAP.com
Drawbacks
● drawbacks of classical approach
○ heavily depends on statistics
○ outdated statistics / expensive to maintain
○ naive cost model
○ assumptions not hold
○ wrong choices / not robust
PingCAP.com
Proposal
● a proposal to dodge those drawbacks
○ new scan operator in executor
○ observe and behave accordingly at runtime
○ relieve planner from access path selection
○ robust by eschewing statistics / cardinality estimation
PingCAP.com
Modeling the Problem
● page caches not considered
● double read
● sensitive / not robust
Transaction
PingCAP.com
Alternative Approaches
● optimizer level: not robust enough
● runtime reoptimization
○ start from index scan
○ switch to table scan if scanned rows exceeds
tipping point
○ binary switch
○ bound worst case
○ risk area
○ not robust
Transaction
PingCAP.com
Smooth Scan
● rough idea
○ executor operator / runtime
○ start from index scan
○ morph between index scan and table scan
○ morph continuously and adaptively
○ trade off CPU and memory for I/O reduction
PingCAP.com
Storage Model
● PostgreSQL style: heap table / B+-tree index
● access path types
○ table scan / index scan
○ bitmap scan
■ reduce random access / cache miss
■ execution model: pipeline breaker
■ order property lost
Transaction
PingCAP.com
Smooth Scan Details
● targeted behavior
○ near-optimal
○ bullet proof of the estimation errors
○ no performance cliff / robust
Transaction
PingCAP.com
Morphing Mechanism
● start from simple index scan
○ monitor retrieved row count
● start morphing if selectivity exceeds threshold
○ probe entire heap page
○ fetch and probe adjacent heap pages if selectivity keeps
increasing
■ start from extra one page
■ morphing region size increases exponentially
PingCAP.com
Correctness
● no tuple missed
○ driven by index scan
● no duplicate tuple
○ bookkeeping
○ CPU / memory cost
PingCAP.com
Morphing Policy
● greedy policy
○ fast convergence for high selectivity
○ unnecessary overhead for low selectivity
● selectivity driven policy
○ monitor local and global selectivity
○ selectivity computed in page level
○ keep or increase morphing region size
● elastic policy
○ skewed data distribution
○ double morphing region size for dense region
○ halve morphing region size for sparse region
PingCAP.com
Threshold
● optimizer driven
○ retrieved row count exceeds optimizer’s estimate
● eager approach
○ start morphing from first tuple
● SLA driven
PingCAP.com
Implementation
● integrated with PostgreSQL
○ page ID cache
○ tuple ID cache
○ result cache to respect order property
■ hash-based data structure
■ store additional tuples found
■ check result cache before index probe
■ pipeline breaker to some extent
■ spill to disk when short of memory
PingCAP.com
Evaluation
● TPC-H and synthetic datasets
● clear database and OS caches
Any Questions ?
Thank You!

Paper Reading: Smooth Scan

  • 1.
    Robust Access PathSelection without Cardinality Estimation Presented by Kenan Yao
  • 2.
    PingCAP.com Problem to Solve ●access path selection ● classical approach ○ predicate push down ○ ranger extracts access conditions ○ selectivity estimation / statistics ○ cost model ○ independent / uniform assumptions Transaction
  • 3.
    PingCAP.com Drawbacks ● drawbacks ofclassical approach ○ heavily depends on statistics ○ outdated statistics / expensive to maintain ○ naive cost model ○ assumptions not hold ○ wrong choices / not robust
  • 4.
    PingCAP.com Proposal ● a proposalto dodge those drawbacks ○ new scan operator in executor ○ observe and behave accordingly at runtime ○ relieve planner from access path selection ○ robust by eschewing statistics / cardinality estimation
  • 5.
    PingCAP.com Modeling the Problem ●page caches not considered ● double read ● sensitive / not robust Transaction
  • 6.
    PingCAP.com Alternative Approaches ● optimizerlevel: not robust enough ● runtime reoptimization ○ start from index scan ○ switch to table scan if scanned rows exceeds tipping point ○ binary switch ○ bound worst case ○ risk area ○ not robust Transaction
  • 7.
    PingCAP.com Smooth Scan ● roughidea ○ executor operator / runtime ○ start from index scan ○ morph between index scan and table scan ○ morph continuously and adaptively ○ trade off CPU and memory for I/O reduction
  • 8.
    PingCAP.com Storage Model ● PostgreSQLstyle: heap table / B+-tree index ● access path types ○ table scan / index scan ○ bitmap scan ■ reduce random access / cache miss ■ execution model: pipeline breaker ■ order property lost Transaction
  • 9.
    PingCAP.com Smooth Scan Details ●targeted behavior ○ near-optimal ○ bullet proof of the estimation errors ○ no performance cliff / robust Transaction
  • 10.
    PingCAP.com Morphing Mechanism ● startfrom simple index scan ○ monitor retrieved row count ● start morphing if selectivity exceeds threshold ○ probe entire heap page ○ fetch and probe adjacent heap pages if selectivity keeps increasing ■ start from extra one page ■ morphing region size increases exponentially
  • 11.
    PingCAP.com Correctness ● no tuplemissed ○ driven by index scan ● no duplicate tuple ○ bookkeeping ○ CPU / memory cost
  • 12.
    PingCAP.com Morphing Policy ● greedypolicy ○ fast convergence for high selectivity ○ unnecessary overhead for low selectivity ● selectivity driven policy ○ monitor local and global selectivity ○ selectivity computed in page level ○ keep or increase morphing region size ● elastic policy ○ skewed data distribution ○ double morphing region size for dense region ○ halve morphing region size for sparse region
  • 13.
    PingCAP.com Threshold ● optimizer driven ○retrieved row count exceeds optimizer’s estimate ● eager approach ○ start morphing from first tuple ● SLA driven
  • 14.
    PingCAP.com Implementation ● integrated withPostgreSQL ○ page ID cache ○ tuple ID cache ○ result cache to respect order property ■ hash-based data structure ■ store additional tuples found ■ check result cache before index probe ■ pipeline breaker to some extent ■ spill to disk when short of memory
  • 15.
    PingCAP.com Evaluation ● TPC-H andsynthetic datasets ● clear database and OS caches
  • 16.

Editor's Notes

  • #3 也就是平时说的表扫还是 index 扫描
  • #9 不像 MySQL 用的 clustered index 存表
  • #13 trigger 之后的策略
  • #15 作者认为 spill 是顺序读写,代价会比 random access 低
  • #16 实验部分太长了,总体来说是在 selectivity 低的时候提升比较多,Q6 是因为 dense region ,Q7 是因为优化器估算 sel 偏低,Q14 应该是两种原因都有