DB reading group may 16, 2018

DB reading group
May. 16, 2018 Keisuke Suzuki

Today’s paper
Efficiently Compiling Efficient Query Plans for Modern Hardware
● Thomas Neuman, 2011 VLDB
○ Creator of HyPer
■ Main memory RDBMS for mixed OLTP and OLAP workloads
● Topic: Query execution on the modern CPUs

Query processing on RDBMS
Scope of this paper

Executing relational algebraic plans
Ri: relation
σ: selection
Γ: aggregation
⋈: natual join
● Variation of executor
○ Compiled VS Interpreted
○ Pipelining VS Block processing
○ Pull VS Push
● ref: CMU Advanced Database
Systems - 03 Query Compilation

Volcano style execution
● interpreted + pipelining + pull
● Pros
○ easy to implement
○ no materialization
● Cons
○ poor cache locality
○ high cost virtual function calls
● popular in disk-based DBMS
○ e.g. PostgreSQL
● performance much worse than
hand-written code on modern systems
1. next()
2. next()
3. a tuple
4. a tuple

Related work: MonetDB/X100
● interpreted + block processing + pull
● Pros
○ better locality than Volcano
● Cons
○ virtual function calls
○ unnecessary tuple copy can be
happened on the boundaries
e.g.) tuples x <> 7 on step 3.
● still slower than hand-written code
1. next_chunk()
2. next_
chunk()
3. chunk
of tuples
4. chunk
of tuples

Proposed method
● compiled + block processing + push
○ tuples are pushed to the next pipeline
breaker (e.g. hash, aggregation, …)
● Pros
○ good locality
○ no virtual function calls
○ generated query execution codes are
easy to parallelize
■ SIMD
■ multi threading
1. loop the filter over
R1 tuples

Translate algebraic plan into code fragments
?

Translation: Pull based
interface Node { Tuple next(); }
class JoinNode implement Node {
Node left, right;
Tuple next() { .. }
}
class SelectNode implement Node {
Node child;
Tuple next() { .. }
}
class ScanNode implement Node {
Tuple next() { .. }
}
● Simple pipelining of operator nodes
● Tree structure

Translation: Proposed method
● Not tree structure
● Ambiguous operation boundaries
?

Producer / Consumer interface
● produce()
○ asks the operator to produce results
● comsume(attributes, source)
○ called to push results forward the
operator
● Flow
1. call produce() of root operator
2. recursively call produce() until
reaching leaf operator
3. leaf operator generate results
4. recursively call consume()
until reaching root operator

Example
⋈{a=b}.produce
-> σ{x=7}.produce
-> scan{R1}.produce
(read tuples from R1)
-> σ{x=7}.consume
(select tuples with x = 7)
-> ⋈{a=b}.consume
(materialize tuples in hash table)

Example
⋈{a=b}.produce
-> σ{x=7}.produce
-> scan{R1}.produce
(read tuples from R1)
-> σ{x=7}.consume
(select tuples with x = 7)
-> ⋈{a=b}.consume
(materialize tuples in hash table)
Materialize breaks loop

Generating Machine Code
● At first: generate C++ codes -> compile -> load as shared library
○ their system written in C++ (HyPer)
○ Bad: slow compilation (multiple seconds)
○ Bad: C++ does not offer total control over the generated code
● Next: Mixed LLVM and C++ codes
○ drive and connect operators by LLVM and call pre-compiled C++
functions for complex processing (e.g. disk IOs, memory allocation)
○ good: fast compilation (a few milliseconds)
○ good: LLVM enables robust assembler producing than manual writing

Performance Tuning
● Branch prediction
○ branch nearly 0% or 100% true is cheap
○ branch 50% true is expensive
20% faster
hash value mostly exists but mostly no collision
-> 1st iteration true, 2nd iteration false

Performance on OLTP / OLAP
● OLTP: small performance improvement
○ low selectivity (touch only small number of tuples)
● OLAP: big performance improvement

Criticism: Maintainability of operator template
● Template expansion easily becomes too complex
○ Code bases increase as more and more optimization added
○ One of the major reason that pull (iterator) model is prefered
● low-level language (LLVM IR)
Some study follow this problem
● e.g. Building Efficient Query Engines in a High-Level Language

DB reading group may 16, 2018

More Related Content

What's hot

Similar to DB reading group may 16, 2018

Recently uploaded

DB reading group may 16, 2018