Paper Reading: Smooth Scan

Robust Access Path Selection without
Cardinality Estimation
Presented by Kenan Yao

PingCAP.com
Problem to Solve
● access path selection
● classical approach
○ predicate push down
○ ranger extracts access conditions
○ selectivity estimation / statistics
○ cost model
○ independent / uniform assumptions
Transaction

PingCAP.com
Drawbacks
● drawbacks of classical approach
○ heavily depends on statistics
○ outdated statistics / expensive to maintain
○ naive cost model
○ assumptions not hold
○ wrong choices / not robust

PingCAP.com
Proposal
● a proposal to dodge those drawbacks
○ new scan operator in executor
○ observe and behave accordingly at runtime
○ relieve planner from access path selection
○ robust by eschewing statistics / cardinality estimation

PingCAP.com
Modeling the Problem
● page caches not considered
● double read
● sensitive / not robust
Transaction

PingCAP.com
Alternative Approaches
● optimizer level: not robust enough
● runtime reoptimization
○ start from index scan
○ switch to table scan if scanned rows exceeds
tipping point
○ binary switch
○ bound worst case
○ risk area
○ not robust
Transaction

PingCAP.com
Smooth Scan
● rough idea
○ executor operator / runtime
○ start from index scan
○ morph between index scan and table scan
○ morph continuously and adaptively
○ trade off CPU and memory for I/O reduction

PingCAP.com
Storage Model
● PostgreSQL style: heap table / B+-tree index
● access path types
○ table scan / index scan
○ bitmap scan
■ reduce random access / cache miss
■ execution model: pipeline breaker
■ order property lost
Transaction

PingCAP.com
Smooth Scan Details
● targeted behavior
○ near-optimal
○ bullet proof of the estimation errors
○ no performance cliff / robust
Transaction

PingCAP.com
Morphing Mechanism
● start from simple index scan
○ monitor retrieved row count
● start morphing if selectivity exceeds threshold
○ probe entire heap page
○ fetch and probe adjacent heap pages if selectivity keeps
increasing
■ start from extra one page
■ morphing region size increases exponentially

PingCAP.com
Correctness
● no tuple missed
○ driven by index scan
● no duplicate tuple
○ bookkeeping
○ CPU / memory cost

PingCAP.com
Morphing Policy
● greedy policy
○ fast convergence for high selectivity
○ unnecessary overhead for low selectivity
● selectivity driven policy
○ monitor local and global selectivity
○ selectivity computed in page level
○ keep or increase morphing region size
● elastic policy
○ skewed data distribution
○ double morphing region size for dense region
○ halve morphing region size for sparse region

PingCAP.com
Threshold
● optimizer driven
○ retrieved row count exceeds optimizer’s estimate
● eager approach
○ start morphing from first tuple
● SLA driven

PingCAP.com
Implementation
● integrated with PostgreSQL
○ page ID cache
○ tuple ID cache
○ result cache to respect order property
■ hash-based data structure
■ store additional tuples found
■ check result cache before index probe
■ pipeline breaker to some extent
■ spill to disk when short of memory

PingCAP.com
Evaluation
● TPC-H and synthetic datasets
● clear database and OS caches

Paper Reading: Smooth Scan

More Related Content

Similar to Paper Reading: Smooth Scan

More from PingCAP

Recently uploaded

Paper Reading: Smooth Scan

Editor's Notes