Sampling based Histogram in MariaDB

1. Sampling based histogram

2. 2 Current implementation ANALYZE TABLE PERSISTENT FOR ALL ● Collect histogram by doing full table scan ● Histogram stored as equal-height 256 bytes ● Stored in mysql database in column_stats ● Stores all values in memory (or on disk if needed) ● Slow

3. 3 Improvements ● Collect Histogram using samples ● Avoid sorting ● O(#rows log(#rows))

4. 4 New Implementation ● User tells % of sampling ● We need min & max in order to build the histogram ● Equal-width Histogram

5. 5 Sampling Steps ● First histogram ● Sample values to get a good estimate for min & max ● Sample again to construct histogram. ●The buckets are between min and max + 2 extra for > max and < min ●If we know min & max ( or we already have a histogram ) then start sampling! ●If there are too many values < min & > max, change min & max and restart sampling

Sampling based Histogram in MariaDB

Recommended

Recommended

More Related Content

Similar to Sampling based Histogram in MariaDB

Similar to Sampling based Histogram in MariaDB (20)

Recently uploaded

Recently uploaded (20)

Sampling based Histogram in MariaDB