[Paper Reading] Efficient Query Processing with Optimistically Compressed Hash Tables & Strings in the USSR

Efficient Query Processing with Optimistically
Compressed Hash Tables & Strings in the USSR
Presented by Huaiyu Xu

PingCAP.com
Motivation
● Hash tables frequently used in analytical queries
● Crucial for overall performance
● But (large) HTs bottlenecked by main memory bandwidth
What can we do about it?

PingCAP.com
Motivation
Orthogonal approaches :
● Optimize access
○ partitioning
● Increase fill-rate
○ Cuckoo, Robin Hood hashing
● Shrink the table itself
○ reduce the bucket/row size (has not received as much attention)
consequently, increase cache-efficiency

PingCAP.com
Shrinking Hash Tables
100 MiB, magically shrink by 10x:
● Increase query throughput
● Downsize your computer
Bonus:
● HT 10MiB, fits into L3/LLC cache
● Improved runtime
Better Latency & Throughput

PingCAP.com
Shrinking Hash Tables
● Domain-guided prefix suppression
● Optimistic splitting
● Unique Strings self-aligned Region (USSR)

PingCAP.com
Domain-guided prefix suppression
● Domain-guided:
○ per-column min/max infomation from meta-data
● Prefix Suppression
○ substract the domain minimum from each value
○ pack multiple columns together

PingCAP.com
Domain-guided prefix suppression
● Compression and Decompression
○ lightweight: handful bitwise operations
○ fast equality comparisons on compressed data
● Generating Pre-Compiled Kernels
○ restrict the number of inputs to 4
○ restrict the types we pack into to 32-, 64- and 128-bit unsigned integers
○ impose an order on the inputs

PingCAP.com
Optimistic Splitting
● decrease effective memory footprint
● Decompose HT into:
● select sum(a) from t; a int64 (8 byte) → sum decimal + a + a (40byte)
● int64 / decimal
Hot HT:
● Frequently accessed
● Cache-resident
● Aggregates:
○ SUM: sub-sums fit smaller
data type
Cold HT:
● Rarely accessed
● Main memory
● Aggregates:
○ SUM: store full SUM or
overflow counter

PingCAP.com
Unique Strings self-aligned Region (USSR)
● Assumption: Many strings repeat
● USSR
○ Query-wide dictionary
○ Limited size (cache resident)
○ Built during scan
● 768kB (hash table 256kB, data region 512kB)
● data region
○ 2^16 slots ( 8-byte / slot )
○ each string takes at least two slots
(one for the hash and one for the string)
○ all pointers inside a data region start
with same 45 bit prefix
● hash table region
○ 2^16 buckets ( 4-byte / bucket )
○ each bucket consists of a 16-bit hash extract
and a 16-bit slot number
○ load factor < 50% (2^16 buckets for at most 2^15 strings)

PingCAP.com
Micro-bench: Faster HashJoin Probe
● micro-benchmark Domain-Guided Prefix Suppression
● 4 keys [0...1000], 4 payloads [0...10]
● 2.5x faster hash probe including the tuple
reconstruction cost
● > 10^6 rows, the speedups were caused by the
more cache-resident hash table
● < 10^6 rows, mostly affected by the more efficient
comparisons directly on compressed data

PingCAP.com
Micro-bench: USSR and Group-By
● SELECT COUNT(*) FROM T GROUP BY s
● 10 unique strings, all strings had the same length
● the time spent on string comparisons when
checking the keys inside group by’s hash table
● the time spent on computing hash of the
string keys

PingCAP.com
TPC-H (sf = 100): memory footprint

PingCAP.com
TPC-H (sf = 100): memory footprint
● Over TPC-H we measured up to 2.1x lower memory consumption
● However, Optimistic Splitting in fact increases (rather than reduces) the overall memory
consumption as it introduces additional data
● The main idea behind Optimistic Splitting is to reduce memory pressure rather than overall
memory consumption

PingCAP.com
TPC-H (sf = 100): query performance
● USSR alone: Q4, Q12, Q16 benefit from faster string hashing and equality comparisons
● CHT alone: improvement of at least 10%. a) more efficient expression evaluation on smaller data types
provide b) more cache efficient hash table operation on compressed keys
● CHT + OPTIMISTIC + USSR: Q1, Q15 benefited from the Optimistic SUM aggregate which boosted the
aggregate computation
● Q2: the regression was caused by type casting overhead which occurred when operating on compact data
types

PingCAP.com
Faster Real-World Workload (Public BI)
● string heavy
● “CommonGovernment” workbook:

[Paper Reading] Efficient Query Processing with Optimistically Compressed Hash Tables & Strings in the USSR

More Related Content

What's hot

Similar to [Paper Reading] Efficient Query Processing with Optimistically Compressed Hash Tables & Strings in the USSR

More from PingCAP

Recently uploaded

[Paper Reading] Efficient Query Processing with Optimistically Compressed Hash Tables & Strings in the USSR