2. Today’s paper
Efficiently Compiling Efficient Query Plans for Modern Hardware
● Thomas Neuman, 2011 VLDB
○ Creator of HyPer
■ Main memory RDBMS for mixed OLTP and OLAP workloads
● Topic: Query execution on the modern CPUs
4. Executing relational algebraic plans
Ri: relation
σ: selection
Γ: aggregation
⋈: natual join
● Variation of executor
○ Compiled VS Interpreted
○ Pipelining VS Block processing
○ Pull VS Push
● ref: CMU Advanced Database
Systems - 03 Query Compilation
5. Volcano style execution
● interpreted + pipelining + pull
● Pros
○ easy to implement
○ no materialization
● Cons
○ poor cache locality
○ high cost virtual function calls
● popular in disk-based DBMS
○ e.g. PostgreSQL
● performance much worse than
hand-written code on modern systems
1. next()
2. next()
3. a tuple
4. a tuple
6. Related work: MonetDB/X100
● interpreted + block processing + pull
● Pros
○ better locality than Volcano
● Cons
○ virtual function calls
○ unnecessary tuple copy can be
happened on the boundaries
e.g.) tuples x <> 7 on step 3.
● still slower than hand-written code
1. next_chunk()
2. next_
chunk()
3. chunk
of tuples
4. chunk
of tuples
7. Proposed method
● compiled + block processing + push
○ tuples are pushed to the next pipeline
breaker (e.g. hash, aggregation, …)
● Pros
○ good locality
○ no virtual function calls
○ generated query execution codes are
easy to parallelize
■ SIMD
■ multi threading
1. loop the filter over
R1 tuples
14. Generating Machine Code
● At first: generate C++ codes -> compile -> load as shared library
○ their system written in C++ (HyPer)
○ Bad: slow compilation (multiple seconds)
○ Bad: C++ does not offer total control over the generated code
● Next: Mixed LLVM and C++ codes
○ drive and connect operators by LLVM and call pre-compiled C++
functions for complex processing (e.g. disk IOs, memory allocation)
○ good: fast compilation (a few milliseconds)
○ good: LLVM enables robust assembler producing than manual writing
15. Performance Tuning
● Branch prediction
○ branch nearly 0% or 100% true is cheap
○ branch 50% true is expensive
20% faster
hash value mostly exists but mostly no collision
-> 1st iteration true, 2nd iteration false
16. Performance on OLTP / OLAP
● OLTP: small performance improvement
○ low selectivity (touch only small number of tuples)
● OLAP: big performance improvement
17. Criticism: Maintainability of operator template
● Template expansion easily becomes too complex
○ Code bases increase as more and more optimization added
○ One of the major reason that pull (iterator) model is prefered
● low-level language (LLVM IR)
Some study follow this problem
● e.g. Building Efficient Query Engines in a High-Level Language