Efficient Query Plans for Modern Hardware

DB reading group
May. 16, 2018 Keisuke Suzuki

Today’s paper
Efficiently Compiling Efficient Query Plans for Modern Hardware
● Thomas Neuman, 2011 VLDB
○ Creator of HyPer
■ Main memory RDBMS for mixed OLTP and OLAP workloads
● Topic: Query execution on the modern CPUs

Query processing on RDBMS
Scope of this paper

Executing relational algebraic plans
Ri: relation
σ: selection
Γ: aggregation
⋈: natual join
● Variation of executor
○ Compiled VS Interpreted
○ Pipelining VS Block processing
○ Pull VS Push
● ref: CMU Advanced Database
Systems - 03 Query Compilation

Volcano style execution
● interpreted + pipelining + pull
● Pros
○ easy to implement
○ no materialization
● Cons
○ poor cache locality
○ high cost virtual function calls
● popular in disk-based DBMS
○ e.g. PostgreSQL
● performance much worse than
hand-written code on modern systems
1. next()
2. next()
3. a tuple
4. a tuple

Related work: MonetDB/X100
● interpreted + block processing + pull
● Pros
○ better locality than Volcano
● Cons
○ virtual function calls
○ unnecessary tuple copy can be
happened on the boundaries
e.g.) tuples x <> 7 on step 3.
● still slower than hand-written code
1. next_chunk()
2. next_
chunk()
3. chunk
of tuples
4. chunk
of tuples

Proposed method
● compiled + block processing + push
○ tuples are pushed to the next pipeline
breaker (e.g. hash, aggregation, …)
● Pros
○ good locality
○ no virtual function calls
○ generated query execution codes are
easy to parallelize
■ SIMD
■ multi threading
1. loop the filter over
R1 tuples

Translate algebraic plan into code fragments
?

Translation: Pull based
interface Node { Tuple next(); }
class JoinNode implement Node {
Node left, right;
Tuple next() { .. }
}
class SelectNode implement Node {
Node child;
Tuple next() { .. }
}
class ScanNode implement Node {
Tuple next() { .. }
}
● Simple pipelining of operator nodes
● Tree structure

Translation: Proposed method
● Not tree structure
● Ambiguous operation boundaries
?

Producer / Consumer interface
● produce()
○ asks the operator to produce results
● comsume(attributes, source)
○ called to push results forward the
operator
● Flow
1. call produce() of root operator
2. recursively call produce() until
reaching leaf operator
3. leaf operator generate results
4. recursively call consume()
until reaching root operator

Example
⋈{a=b}.produce
-> σ{x=7}.produce
-> scan{R1}.produce
(read tuples from R1)
-> σ{x=7}.consume
(select tuples with x = 7)
-> ⋈{a=b}.consume
(materialize tuples in hash table)

Example
⋈{a=b}.produce
-> σ{x=7}.produce
-> scan{R1}.produce
(read tuples from R1)
-> σ{x=7}.consume
(select tuples with x = 7)
-> ⋈{a=b}.consume
(materialize tuples in hash table)
Materialize breaks loop

Generating Machine Code
● At first: generate C++ codes -> compile -> load as shared library
○ their system written in C++ (HyPer)
○ Bad: slow compilation (multiple seconds)
○ Bad: C++ does not offer total control over the generated code
● Next: Mixed LLVM and C++ codes
○ drive and connect operators by LLVM and call pre-compiled C++
functions for complex processing (e.g. disk IOs, memory allocation)
○ good: fast compilation (a few milliseconds)
○ good: LLVM enables robust assembler producing than manual writing

Performance Tuning
● Branch prediction
○ branch nearly 0% or 100% true is cheap
○ branch 50% true is expensive
20% faster
hash value mostly exists but mostly no collision
-> 1st iteration true, 2nd iteration false

Performance on OLTP / OLAP
● OLTP: small performance improvement
○ low selectivity (touch only small number of tuples)
● OLAP: big performance improvement

Criticism: Maintainability of operator template
● Template expansion easily becomes too complex
○ Code bases increase as more and more optimization added
○ One of the major reason that pull (iterator) model is prefered
● low-level language (LLVM IR)
Some study follow this problem
● e.g. Building Efficient Query Engines in a High-Level Language

Efficient Query Plans for Modern Hardware

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Efficient Query Plans for Modern Hardware

Similar to Efficient Query Plans for Modern Hardware (20)

Recently uploaded

Recently uploaded (20)

Efficient Query Plans for Modern Hardware