SlideShare a Scribd company logo
1 of 161
Download to read offline
VLDB 2009 Tutorial 
Column-Oriented Database Systems 
1 
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Column-Oriented 
Database Systems 
Part 1: Stavros Harizopoulos (HP Labs) 
Part 2: Daniel Abadi (Yale) 
Part 3: Peter Boncz (CWI) 
VLDB 
2009 
Tutorial
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
What is a column-store? 
row-store column-store 
Date Store Product Customer 
Date Store Product Customer Price Price 
+ easy to add/modify a record 
- might read in unnecessary data 
+ only need to read in relevant data 
- tuple writes require multiple accesses 
=> suitable for read-mostly, read-intensive, large data repositories 
VLDB 2009 Tutorial Column-Oriented Database Systems 2
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Are these two fundamentally different? 
l The only fundamental difference is the storage layout 
l However: we need to look at the big picture 
different storage layouts proposed 
row-stores row-stores++ row-stores++ 
‘70s ‘80s ‘90s ‘00s today 
new applications 
new bottleneck in hardware 
column-stores 
converge? 
l How did we get here, and where we are heading 
l What are the column-specific optimizations? 
Part 2 
Part 1 
l How do we improve CPU efficiency when operating on Cs 
Part 3 
VLDB 2009 Tutorial Column-Oriented Database Systems 3
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Outline 
l Part 1: Basic concepts — Stavros 
l Introduction to key features 
l From DSM to column-stores and performance tradeoffs 
l Column-store architecture overview 
l Will rows and columns ever converge? 
l Part 2: Column-oriented execution — Daniel 
l Part 3: MonetDB/X100 and CPU efficiency — Peter 
VLDB 2009 Tutorial Column-Oriented Database Systems 4
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Telco Data Warehousing example 
l Typical DW installation 
l Real-world example 
fact table 
usage source 
dimension tables 
account 
or RAM 
toll 
star schema 
“One Size Fits All? - Part 2: Benchmarking 
Results” Stonebraker et al. CIDR 2007 
QUERY 2 
SELECT account.account_number, 
sum (usage.toll_airtime), 
sum (usage.toll_price) 
FROM usage, toll, source, account 
WHERE usage.toll_id = toll.toll_id 
AND usage.source_id = source.source_id 
AND usage.account_id = account.account_id 
AND toll.type_ind in (‘AE’. ‘AA’) 
AND usage.toll_price > 0 
AND source.type != ‘CIBER’ 
AND toll.rating_method = ‘IS’ 
AND usage.invoice_date = 20051013 
GROUP BY account.account_number 
Column-store Row-store 
Query 1 2.06 300 
Query 2 2.20 300 
Query 3 0.09 300 
Query 4 5.24 300 
Query 5 2.88 300 
Why? Three main factors (next slides) 
VLDB 2009 Tutorial Column-Oriented Database Systems 5
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Telco example explained (1/3): 
read efficiency 
row store column store 
read pages containing entire rows 
one row = 212 columns! 
is this typical? (it depends) 
read only columns needed 
in this example: 7 columns 
caveats: 
• “select * ” not any faster 
• clever disk prefetching 
• clever tuple reconstruction What about vertical partitioning? 
(it does not work with ad-hoc 
queries) 
VLDB 2009 Tutorial Column-Oriented Database Systems 6
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Telco example explained (2/3): 
compression efficiency 
l Columns compress better than rows 
l Typical row-store compression ratio 1 : 3 
l Column-store 1 : 10 
l Why? 
l Rows contain values from different domains 
=> more entropy, difficult to dense-pack 
l Columns exhibit significantly less entropy 
l Examples: 
Male, Female, Female, Female, Male 
1998, 1998, 1999, 1999, 1999, 2000 
l Caveat: CPU cost (use lightweight compression) 
VLDB 2009 Tutorial Column-Oriented Database Systems 7
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Telco example explained (3/3): 
sorting & indexing efficiency 
l Compression and dense-packing free up space 
l Use multiple overlapping column collections 
l Sorted columns compress better 
l Range queries are faster 
l Use sparse clustered indexes 
What about heavily-indexed row-stores? 
(works well for single column access, 
cross-column joins become increasingly expensive) 
VLDB 2009 Tutorial Column-Oriented Database Systems 8
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Additional opportunities for column-stores 
l Block-tuple / vectorized processing 
l Easier to build block-tuple operators 
l Amortizes function-call cost, improves CPU cache performance 
l Easier to apply vectorized primitives 
l Software-based: bitwise operations 
l Hardware-based: SIMD 
Part 3 
l Opportunities with compressed columns 
l Avoid decompression: operate directly on compressed 
l Delay decompression (and tuple reconstruction) 
l Also known as: late materialization 
more 
in Part 2 
l Exploit columnar storage in other DBMS components 
l Physical design (both static and dynamic) 
See: Database 
Cracking, from CWI 
VLDB 2009 Tutorial Column-Oriented Database Systems 9
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Effect on C-Store performance 
“Column-Stores vs Row-Stores: 
How Different are They 
Really?” Abadi, Hachem, and 
Madden. SIGMOD 2008. 
Time (sec) Average for SSBM queries on C-store 
original 
C-store 
enable 
late 
enable materialization 
compression & 
operate on compressed 
column-oriented 
join algorithm 
VLDB 2009 Tutorial Column-Oriented Database Systems 10
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Summary of column-store key features 
l Storage layout 
l Execution engine 
l Design tools, optimizer 
columnar storage 
header/ID elimination 
compression 
multiple sort orders 
Part 1 
Part 2 
column operators 
avoid decompression 
Part 1 
Part 3 
Part 2 
Part 2 
late materialization 
vectorized operations Part 3 
VLDB 2009 Tutorial Column-Oriented Database Systems 11
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Outline 
l Part 1: Basic concepts — Stavros 
l Introduction to key features 
l From DSM to column-stores and performance tradeoffs 
l Column-store architecture overview 
l Will rows and columns ever converge? 
l Part 2: Column-oriented execution — Daniel 
l Part 3: MonetDB/X100 and CPU efficiency — Peter 
VLDB 2009 Tutorial Column-Oriented Database Systems 12
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
From DSM to Column-stores 
70s -1985: 
TOD: Time Oriented Database – Wiederhold et al. 
"A Modular, Self-Describing Clinical Databank 
System," Computers and Biomedical Research, 1975 
More 1970s: Transposed files, Lorie, Batory, 
Svensson. 
“An overview of cantor: a new system for data analysis” 
Karasalo, Svensson, SSDBM 1983 
1985: DSM paper 
“A decomposition storage model” 
Copeland and Khoshafian. SIGMOD 1985. 
1990s: Commercialization through SybaseIQ 
Late 90s – 2000s: Focus on main-memory performance 
l DSM “on steroids” [1997 – now] 
l Hybrid DSM/NSM [2001 – 2004] 
CWI: MonetDB 
Wisconsin: PAX, Fractured Mirrors 
Michigan: Data Morphing CMU: Clotho 
2005 – : Re-birth of read-optimized DSM as “column-store” 
MIT: C-Store CWI: MonetDB/X100 10+ startups 
VLDB 2009 Tutorial Column-Oriented Database Systems 13
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
The original DSM paper 
“A decomposition storage 
model” Copeland and 
Khoshafian. SIGMOD 1985. 
l Proposed as an alternative to NSM 
l 2 indexes: clustered on ID, non-clustered on value 
l Speeds up queries projecting few columns 
l Requires more storage 
1 2 3 4 .. 
ID 
value 
0100 0962 1000 .. 
VLDB 2009 Tutorial Column-Oriented Database Systems 14
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Memory wall and PAX 
l 90s: Cache-conscious research 
“Cache Conscious Algorithms for 
Relational Query Processing.” 
Shatdal, Kant, Naughton. VLDB 1994. 
from: 
“Database Architecture Optimized for 
the New Bottleneck: Memory Access.” 
Boncz, Manegold, Kersten. VLDB 1999. 
l PAX: Partition Attributes Across 
l Retains NSM I/O pattern 
“DBMSs on a modern processor: 
Where does time go?” Ailamaki, 
DeWitt, Hill, Wood. VLDB 1999. 
l Optimizes cache-to-RAM communication 
“Weaving Relations for Cache Performance.” 
Ailamaki, DeWitt, Hill, Skounakis, VLDB 2001. 
VLDB 2009 Tutorial Column-Oriented Database Systems 15 
to: 
and:
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
More hybrid NSM/DSM schemes 
l Dynamic PAX: Data Morphing 
“Data morphing: an adaptive, cache-conscious 
storage technique.” Hankins, Patel, VLDB 2003. 
l Clotho: custom layout using scatter-gather I/O 
“Clotho: Decoupling Memory Page Layout from Storage Organization.” 
Shao, Schindler, Schlosser, Ailamaki, and Ganger. VLDB 2004. 
l Fractured mirrors 
l Smart mirroring with both NSM/DSM copies 
“A Case For Fractured 
Mirrors.” Ramamurthy, 
DeWitt, Su, VLDB 2002. 
VLDB 2009 Tutorial Column-Oriented Database Systems 16
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
MonetDB (more in Part 3) 
l Late 1990s, CWI: Boncz, Manegold, and Kersten 
l Motivation: 
l Main-memory 
l Improve computational efficiency by avoiding expression 
interpreter 
l DSM with virtual IDs natural choice 
l Developed new query execution algebra 
l Initial contributions: 
l Pointed out memory-wall in DBMSs 
l Cache-conscious projections and joins 
l … 
VLDB 2009 Tutorial Column-Oriented Database Systems 17
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
2005: the (re)birth of column-stores 
l New hardware and application realities 
l Faster CPUs, larger memories, disk bandwidth limit 
l Multi-terabyte Data Warehouses 
l New approach: combine several techniques 
l Read-optimized, fast multi-column access, 
disk/CPU efficiency, light-weight compression 
l C-store paper: 
l First comprehensive design description of a column-store 
l MonetDB/X100 
l “proper” disk-based column store 
l Explosion of new products 
VLDB 2009 Tutorial Column-Oriented Database Systems 18
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Performance tradeoffs: columns vs. rows 
DSM traditionally was not favored by technology trends 
How has this changed? 
l Optimized DSM in “Fractured Mirrors,” 2002 
l “Apples-to-apples” comparison 
l Follow-up study 
“Read-Optimized Databases, In- 
Depth” Holloway, DeWitt, VLDB’08 
l Main-memory DSM vs. NSM 
“Performance Tradeoffs in Read- 
Optimized Databases” 
Harizopoulos, Liang, Abadi, 
Madden, VLDB’06 
“DSM vs. NSM: CPU performance tradeoffs in block-oriented 
query processing” Boncz, Zukowski, Nes, DaMoN’08 
l Flash-disks: a come-back for PAX? 
“Query Processing Techniques 
for Solid State Drives” 
Tsirogiannis, Harizopoulos, 
Shah, Wiener, Graefe, 
SIGMOD’09 
“Fast Scans and Joins Using Flash 
Drives” Shah, Harizopoulos, 
Wiener, Graefe. DaMoN’08 
VLDB 2009 Tutorial Column-Oriented Database Systems 19
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Fractured mirrors: a closer look 
l Store DSM relations inside a B-tree 
l Leaf nodes contain values 
“A Case For Fractured 
Mirrors” Ramamurthy, 
DeWitt, Su, VLDB 2002. 
l Eliminate IDs, amortize header overhead 
l Custom implementation on Shore 
sparse 
B-tree on ID 
3 
1 a1 
2 a2 3 a3 4 a4 5 a5 
1 a1 a2 a3 4 a4 a5 
“Efficient columnar 
storage in B-trees” Graefe. 
Sigmod Record 03/2007. 
Similar: storage density 
comparable 
to column stores 
TTIIDD 
1 
2 
3 
Column 
Data 
Column 
Data 
a1 
a2 
a3 
Tuple 
Header 
Tuple 
Header 
4 a4 
5 a5 
VLDB 2009 Tutorial Column-Oriented Database Systems 20
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Fractured mirrors: performance 
From PAX paper: 
regular DSM 
l Chunk-based tuple merging 
l Read in segments of M pages 
l Merge segments in memory 
time 
l Becomes CPU-bound after 5 pages 
row 
column? 
column? 
columns projected: 
1 2 3 4 5 
optimized 
DSM 
VLDB 2009 Tutorial Column-Oriented Database Systems 21
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Column-scanner 
implementation 
“Performance Tradeoffs in Read- 
Optimized Databases” 
Harizopoulos, Liang, Abadi, 
Madden, VLDB’06 
row scanner column scanner 
Joe 45 
… … 
apply 
predicate(s) 
1 Joe 45 
2 Sue 37 
… … … 
SELECT name, age 
WHERE age > 40 
Joe 
Sue 
Joe 45 
… … 
predicate #1 
45 
37 
… 
… 
S 
Direct I/O 
prefetch ~100ms 
worth of data 
#POS 45 
#POS … 
S 
S 
apply 
VLDB 2009 Tutorial Column-Oriented Database Systems 22
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Scan performance 
l Large prefetch hides disk seeks in columns 
l Column-CPU efficiency with lower selectivity 
l Row-CPU suffers from memory stalls 
l Memory stalls disappear in narrow tuples 
l Compression: similar to narrow 
not shown, 
details in the paper 
VLDB 2009 Tutorial Column-Oriented Database Systems 23
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Even more results 
“Read-Optimized Databases, In- 
Depth” Holloway, DeWitt, VLDB’08 
35 
narrow & compressed tuple: 
30 
25 
20 
Time (s) 
15 
10 
5 
0 
CPU-bound! 
C-25% 
C-10% 
R-50% 
1 2 3 4 5 6 7 8 9 10 
Columns Returned 
• Same engine as before 
• Additional findings 
wide attributes: 
same as before 
Non-selective queries, narrow tuples, favor well-compressed rows 
Materialized views are a win 
Column-joins are 
Scan times determine early materialized joins 
covered in part 2! 
VLDB 2009 Tutorial Column-Oriented Database Systems 24
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Speedup of columns over rows 
“Performance Tradeoffs in Read- 
Optimized Databases” 
Harizopoulos, Liang, Abadi, 
Madden, VLDB’06 
+++ 
_= + ++ 
8 12 16 20 24 28 32 36 
tuple width 
cycles per disk byte 
(cpdb) 
144 
72 
36 
18 
9 
l Rows favored by narrow tuples and low cpdb 
l Disk-bound workloads have higher cpdb 
VLDB 2009 Tutorial Column-Oriented Database Systems 25
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Varying prefetch size 
40 
30 
20 
10 
0 
4 8 12 16 20 24 28 32 
Column 2 
Column 8 
Column 16 
Column 48 (x 128KB) 
l No prefetching hurts columns in single scans 
VLDB 2009 Tutorial Column-Oriented Database Systems 26 
time (sec) 
selected bytes per tuple 
Row (any prefetch size) 
no competing 
disk traffic
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Varying prefetch size 
with competing disk traffic 
40 
30 
20 
10 
0 
Column, 48 
Row, 48 
4 12 20 28 
40 
30 
20 
10 
0 
Column, 8 
Row, 8 
4 12 20 28 
selected bytes per tuple 
time (sec) 
l No prefetching hurts columns in single scans 
l Under competing traffic, columns outperform rows for 
any prefetch size 
VLDB 2009 Tutorial Column-Oriented Database Systems 27
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
CPU Performance 
“DSM vs. NSM: CPU performance trade 
offs in block-oriented query processing” 
Boncz, Zukowski, Nes, DaMoN’08 
l Benefit in on-the-fly conversion between NSM and DSM 
l DSM: sequential access (block fits in L2), random in L1 
l NSM: random access, SIMD for grouped Aggregation 
VLDB 2009 Tutorial Column-Oriented Database Systems 28
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
New storage technology: Flash SSDs 
l Performance characteristics 
l very fast random reads, slow random writes 
l fast sequential reads and writes 
l Price per bit (capacity follows) 
l cheaper than RAM, order of magnitude more expensive than Disk 
l Flash Translation Layer introduces unpredictability 
l avoid random writes! 
l Form factors not ideal yet 
l SSD (Ł small reads still suffer from SATA overhead/OS limitations) 
l PCI card (Ł high price, limited expandability) 
l Boost Sequential I/O in a simple package 
l Flash RAID: very tight bandwidth/cm3 packing (4GB/sec inside the box) 
l Column Store Updates 
l useful for delta structures and logs 
l Random I/O on flash fixes unclustered index access 
l still suboptimal if I/O block size > record size 
l therefore column stores profit mush less than horizontal stores 
l Random I/O useful to exploit secondary, tertiary table orderings 
l the larger the data, the deeper clustering one can exploit 
VLDB 2009 Tutorial Column-Oriented Database Systems 29
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Even faster column scans on flash SSDs 
l New-generation SSDs 
30K Read IOps, 3K Write Iops 
250MB/s Read BW, 200MB/s Write 
l Very fast random reads, slower random writes 
l Fast sequential RW, comparable to HDD arrays 
l No expensive seeks across columns 
l FlashScan and Flashjoin: PAX on SSDs, inside Postgres 
“Query Processing Techniques for 
Solid State Drives” Tsirogiannis, 
Harizopoulos, Shah, Wiener, Graefe, 
SIGMOD’09 
mini-pages with no 
qualified attributes are 
not accessed 
VLDB 2009 Tutorial Column-Oriented Database Systems 30
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Column-scan performance over time 
from 7x slower 
..to 1.2x slower 
..to same 
column-store (2006) 
..to 2x slower 
and 3x faster! 
regular DSM (2001) 
optimized DSM (2002) 
SSD Postgres/PAX (2009) 
VLDB 2009 Tutorial Column-Oriented Database Systems 31
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Outline 
l Part 1: Basic concepts — Stavros 
l Introduction to key features 
l From DSM to column-stores and performance tradeoffs 
l Column-store architecture overview 
l Will rows and columns ever converge? 
l Part 2: Column-oriented execution — Daniel 
l Part 3: MonetDB/X100 and CPU efficiency — Peter 
VLDB 2009 Tutorial Column-Oriented Database Systems 32
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Architecture of a column-store 
storage layout 
l read-optimized: dense-packed, compressed 
l organize in extends, batch updates 
l multiple sort orders 
l sparse indexes 
engine 
l block-tuple operators 
l new access methods 
l optimized relational system-level operators 
l system-wide column support 
l loading / updates 
l scaling through multiple nodes 
l transactions / redundancy 
VLDB 2009 Tutorial Column-Oriented Database Systems 33
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
C-Store 
l Compress columns 
l No alignment 
l Big disk blocks 
“C-Store: A Column-Oriented 
DBMS.” Stonebraker et al. 
VLDB 2005. 
l Only materialized views (perhaps many) 
l Focus on Sorting not indexing 
l Data ordered on anything, not just time 
l Automatic physical DBMS design 
l Optimize for grid computing 
l Innovative redundancy 
l Xacts – but no need for Mohan 
l Column optimizer and executor 
VLDB 2009 Tutorial Column-Oriented Database Systems 34
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
C-Store: only materialized views (MVs) 
l Projection (MV) is some number of columns from a fact table 
l Plus columns in a dimension table – with a 1-n join between 
Fact and Dimension table 
l Stored in order of a storage key(s) 
l Several may be stored! 
l With a permutation, if necessary, to map between them 
l Table (as the user specified it and sees it) is not stored! 
l No secondary indexes (they are a one column sorted MV plus 
a permutation, if you really want one) 
User view: 
EMP (name, age, salary, dept) 
Dept (dname, floor) 
Possible set of MVs: 
MV-1 (name, dept, floor) in floor order 
MV-2 (salary, age) in age order 
MV-3 (dname, salary, name) in salary order 
VLDB 2009 Tutorial Column-Oriented Database Systems 35
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Continuous Load and Query (Vertica) 
Hybrid Storage Architecture 
TUPLE MOVER 
Asynchronous 
Data Transfer 
> Read Optimized 
Store (ROS) 
• On disk 
• Sorted / Compressed 
• Segmented 
• Large data loaded direct 
A B C 
(A B C | A) 
Trickle 
Load 
> Write Optimized 
Store (WOS) 
A B C 
§Memory based 
§Unsorted / Uncompressed 
§Segmented 
§Low latency / Small quick 
inserts 
VLDB 2009 Tutorial Column-Oriented Database Systems 36
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Loading Data (Vertica) 
Write-Optimized 
Store (WOS) 
In-memory 
Automatic 
Tuple Mover 
Read-Optimized 
Store (ROS) 
On-disk 
> INSERT, UPDATE, DELETE 
> Bulk and Trickle Loads 
§COPY 
§COPY DIRECT 
> User loads data into logical Tables 
> Vertica loads atomically into 
storage 
VLDB 2009 Tutorial Column-Oriented Database Systems 37
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Applications for column-stores 
l Data Warehousing 
l High end (clustering) 
l Mid end/Mass Market 
l Personal Analytics 
l Data Mining 
l E.g. Proximity 
l Google BigTable 
l RDF 
l Semantic web data management 
l Information retrieval 
l Terabyte TREC 
l Scientific datasets 
l SciDB initiative 
l SLOAN Digital Sky Survey on MonetDB 
VLDB 2009 Tutorial Column-Oriented Database Systems 38
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
List of column-store systems 
l Cantor (history) 
l Sybase IQ 
l SenSage (former Addamark Technologies) 
l Kdb 
l 1010data 
l MonetDB 
l C-Store/Vertica 
l X100/VectorWise 
l KickFire 
l SAP Business Accelerator 
l Infobright 
l ParAccel 
l Exasol 
VLDB 2009 Tutorial Column-Oriented Database Systems 39
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Outline 
l Part 1: Basic concepts — Stavros 
l Introduction to key features 
l From DSM to column-stores and performance tradeoffs 
l Column-store architecture overview 
l Will rows and columns ever converge? 
l Part 2: Column-oriented execution — Daniel 
l Part 3: MonetDB/X100 and CPU efficiency — Peter 
VLDB 2009 Tutorial Column-Oriented Database Systems 40
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Simulate a Column-Store inside a Row-Store 
Date Store Product Customer Price 
Date 
01/01 
01/01 
01/01 
Option B: 
Index Every Column 
Date Index 
Store Index 
Store 
Product 
Customer 
Price 
VLDB 2009 Tutorial Column-Oriented Database Systems 41 
1 
2 
3 
1 
2 
3 
1 
2 
3 
Option A: 
Vertical Partitioning 
… 
1 
2 
3 
1 
2 
3 
01/01 
01/01 
01/01 
BOS 
NYC 
BOS 
Table 
Chair 
Bed 
Mesa 
Lutz 
Mudd 
$20 
$13 
$79 
BOS 
NYC 
BOS 
Table 
Chair 
Bed 
Mesa 
Lutz 
Mudd 
$20 
$13 
$79 
TID Value 
TID Value 
TID Value 
TID Value 
TID Value
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Simulate a Column-Store inside a Row-Store 
Date Store Product Customer Price 
Vertical Partitioning 
Store 
TID Value 
1 
2 
3 
BOS 
NYC 
BOS 
Product 
TID Value 
1 
2 
3 
Option A: 
Option B: 
Index Every Column 
Date Index 
Store Index 
… 
Table 
Chair 
Customer 
TID Value 
1 
2 
3 
Price 
TID Value 
1 
2 
3 
01/01 
01/01 
01/01 
Bed 
Mesa 
Lutz 
Mudd 
$20 
$13 
$79 
BOS 
NYC 
BOS 
Table 
Chair 
Bed 
Mesa 
Lutz 
Mudd 
$20 
$13 
$79 
Date 
Value StartPos 
Length 
01/01 1 3 
Can explicitly run-length 
encode date 
“Teaching an Old Elephant New Tricks.” 
Bruno, CIDR 2009. 
VLDB 2009 Tutorial Column-Oriented Database Systems 42
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Experiments 
250.0 
200.0 
150.0 
100.0 
50.0 
0.0 
Normal Row-Store 
Vertically Partitioned 
Row-Store 
Adjoined Dimension Column Index (ADC Index) 
to Improve Star Schema Query Performance”. 
O’Neil et. al. ICDE 2008. 
Row-Store With All 
Indexes 
VLDB 2009 Tutorial Column-Oriented Database Systems 43 
Time (seconds) 
Average 25.7 79.9 221.2 
“Column-Stores vs Row-Stores: 
How Different are They Really?” 
Abadi, Hachem, and Madden. 
SIGMOD 2008. 
l Star Schema Benchmark (SSBM) 
l Implemented by professional DBA 
l Original row-store plus 2 column-store 
simulations on same row-store product
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
What’s Going On? Vertical Partitions 
l Vertical partitions in row-stores: 
l Work well when workload is known 
l ..and queries access disjoint sets of columns 
l See automated physical design 
l Do not work well as full-columns 
l TupleID overhead significant 
l Excessive joins 
“Column-Stores vs. Row-Stores: 
How Different Are They Really?” 
Abadi, Madden, and Hachem. 
SIGMOD 2008. 
TID Column 
1 
2 
3 
Data 
Tuple 
Header 
Queries touch 3-4 foreign keys in fact table, 
1-2 numeric columns 
Complete fact table takes up ~4 GB 
(compressed) 
Vertically partitioned tables take up 0.7-1.1 
GB (compressed) 
VLDB 2009 Tutorial Column-Oriented Database Systems 44
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
What’s Going On? All Indexes Case 
l Tuple construction 
l Common type of query: 
SELECT store_name, SUM(revenue) 
FROM Facts, Stores 
WHERE fact.store_id = stores.store_id 
AND stores.country = “Canada” 
GROUP BY store_name 
l Result of lower part of query plan is a set of TIDs that passed 
all predicates 
l Need to extract SELECT attributes at these TIDs 
l BUT: index maps value to TID 
l You really want to map TID to value (i.e., a vertical partition) 
à Tuple construction is SLOW 
VLDB 2009 Tutorial Column-Oriented Database Systems 45
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
So…. 
l All indexes approach is a poor way to simulate a column-store 
l Problems with vertical partitioning are NOT fundamental 
l Store tuple header in a separate partition 
l Allow virtual TIDs 
l Combine clustered indexes, vertical partitioning 
l So can row-stores simulate column-stores? 
l Might be possible, BUT: 
l Need better support for vertical partitioning at the storage layer 
l Need support for column-specific optimizations at the executer level 
l Full integration: buffer pool, transaction manager, .. 
l When will this happen? 
l Most promising features = soon 
See Part 2, Part 3 
for most promising 
features 
l ..unless new technology / new objectives change the game 
(SSDs, Massively Parallel Platforms, Energy-efficiency) 
VLDB 2009 Tutorial Column-Oriented Database Systems 46
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
End of Part 1 
l Basic concepts — Stavros 
l Introduction to key features 
l From DSM to column-stores and performance tradeoffs 
l Column-store architecture overview 
l Will rows and columns ever converge? 
l Part 2: Column-oriented execution — Daniel 
l Part 3: MonetDB/X100 and CPU efficiency — Peter 
VLDB 2009 Tutorial Column-Oriented Database Systems 47
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Part 2 Outline 
l Compression 
l Tuple Materialization 
l Joins 
VLDB 2009 Tutorial Column-Oriented Database Systems 48
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Column-Oriented 
Database Systems 
Compression 
VLDB 
2009 
Tutorial 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
“Integrating Compression and Execution in Column- 
Oriented Database Systems” Abadi, Madden, and 
Ferreira, SIGMOD ’06 
•Query optimization in compressed database 
systems” Chen, Gehrke, Korn, SIGMOD’01
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Compression 
l Trades I/O for CPU 
l Increased column-store opportunities: 
l Higher data value locality in column stores 
l Techniques such as run length encoding far more 
useful 
l Can use extra space to store multiple copies of 
data in different sort orders 
VLDB 2009 Tutorial Column-Oriented Database Systems 50
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Run-length Encoding 
Quarter Product ID 
Q1 
Q1 
Q1 
Q1 
Q1 
Q1 
Q1 
… 
Q2 
Q2 
Q2 
Q2 
… 
1 
1 
1 
1 
1 
2 
2 
… 
1 
1 
1 
2 
… 
Quarter Product ID 
(value, start_pos, run_length) 
(value, start_pos, run_length) 
(1, 1, 5) 
(2, 6, 2) 
… 
(1, 301, 3) 
(2, 304, 1) 
… 
(Q1, 1, 300) 
(Q2, 301, 350) 
(Q3, 651, 500) 
(Q4, 1151, 600) 
Price 
5 
7 
2 
9 
6 
8 
5 
… 
3 
8 
1 
4 
… 
Price 
5 
7 
2 
9 
6 
8 
5 
… 
3 
8 
1 
4 
… 
VLDB 2009 Tutorial Column-Oriented Database Systems 51
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Bit-vector Encoding 
“Integrating Compression and Execution in Column- 
Oriented Database Systems” Abadi et. al, SIGMOD ’06 
Product ID 
1 
1 
1 
1 
1 
2 
2 
… 
1 
1 
2 
3 
… 
ID: 1 ID: 2 ID: 3 
1 
1 
1 
1 
1 
0 
0 
… 
1 
1 
0 
0 
… 
… 
0 
0 
0 
0 
0 
0 
0 
… 
0 
0 
0 
0 
… 
0 
0 
0 
0 
0 
0 
0 
… 
0 
0 
0 
1 
… 
0 
0 
0 
0 
0 
1 
1 
… 
0 
0 
1 
0 
… 
l For each unique 
value, v, in column 
c, create bit-vector 
b 
l b[i] = 1 if c[i] = v 
l Good for columns 
with few unique 
values 
l Each bit-vector 
can be further 
compressed if 
sparse 
VLDB 2009 Tutorial Column-Oriented Database Systems 52
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Integrating Compression and Execution in Column- 
Oriented Database Systems” Abadi et. al, SIGMOD ’06 
Dictionary Encoding 
Quarter 
Q1 
Q2 
Q4 
Q1 
Q3 
Q1 
Q1 
Q1 
Q2 
Q4 
Q3 
Q3 
… 
Quarter 
0130200 
1322 
Quarter 
24 
128 
122 
+ 
Dictionary 
24: Q1, Q2, Q4, Q1 
… 
0 
+ 
Dictionary 
0: Q1 
1: Q2 
2: Q3 
3: Q4 
OR 
122: Q2, Q4, Q3, Q3 
… 
128: Q3, Q1, Q1, Q1 
l For each 
unique value 
create 
dictionary entry 
l Dictionary can 
be per-block or 
per-column 
l Column-stores 
have the 
advantage that 
dictionary 
entries may 
encode multiple 
values at once 
VLDB 2009 Tutorial Column-Oriented Database Systems 53
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Frame Of Reference Encoding 
Price 
45 
54 
48 
55 
51 
53 
40 
50 
49 
62 
52 
50 
… 
Price 
Frame: 50 
-5 
4 
-2 
5 
1 
3 
∞ 
40 
0 
-1 
∞ 
62 
2 
0 
… 
4 bits per 
value 
Exceptions (see 
part 3 for a better 
way to deal with 
exceptions) 
l Encodes values as b bit 
offset from chosen frame 
of reference 
l Special escape code (e.g. 
all bits set to 1) indicates 
a difference larger than 
can be stored in b bits 
l After escape code, 
original (uncompressed) 
value is written 
“Compressing Relations and Indexes 
” Goldstein, Ramakrishnan, Shaft, 
ICDE’98 
VLDB 2009 Tutorial Column-Oriented Database Systems 54
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Differential Encoding 
Time 
5:00 
5:02 
5:03 
5:03 
5:04 
5:06 
5:07 
5:08 
5:10 
5:15 
5:16 
5:16 
… 
Time 
5:00 
2 
1 
0 
1 
2 
1 
1 
2 
∞ 
5:15 
1 
0 
2 bits per 
value 
Exception (see 
part 3 for a better 
way to deal with 
exceptions) 
l Encodes values as b bit offset 
from previous value 
l Special escape code (just like 
frame of reference encoding) 
indicates a difference larger than 
can be stored in b bits 
l After escape code, original 
(uncompressed) value is written 
l Performs well on columns 
containing increasing/decreasing 
sequences 
l inverted lists 
l timestamps 
l object IDs 
l sorted / clustered columns 
“Improved Word-Aligned Binary 
Compression for Text Indexing” 
Ahn, Moffat, TKDE’06 
VLDB 2009 Tutorial Column-Oriented Database Systems 55
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
What Compression Scheme To Use? 
Does column appear in the sort key? 
Is the average 
run-length > 2 
Are number of unique 
values < ~50000 
yes 
no 
Differential 
Encoding 
yes 
RLE 
Is the data numerical 
and exhibit good locality? 
yes 
Frame of Reference 
Encoding 
no 
Leave Data 
Uncompressed 
OR 
Heavyweight 
Compression 
yes 
Does this column appear frequently 
in selection predicates? 
yes 
Bit-vector 
Compression 
no 
Dictionary 
Compression 
no 
no 
VLDB 2009 Tutorial Column-Oriented Database Systems 56
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
Heavy-Weight Compression Schemes 
l Modern disk arrays can achieve > 1GB/s 
l 1/3 CPU for decompression Ł 3GB/s needed 
Ł Lightweight compression schemes are better 
Ł Even better: operate directly on compressed data 
VLDB 2009 Tutorial Column-Oriented Database Systems 57
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Integrating Compression and Execution in Column- 
Oriented Database Systems” Abadi et. al, SIGMOD ’06 
Operating Directly on Compressed Data 
l I/O - CPU tradeoff is no longer a tradeoff 
l Reduces memory–CPU bandwidth requirements 
l Opens up possibility of operating on multiple 
records at once 
VLDB 2009 Tutorial Column-Oriented Database Systems 58
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Integrating Compression and Execution in Column- 
Oriented Database Systems” Abadi et. al, SIGMOD ’06 
Operating Directly on Compressed Data 
Quarter Product ID 
1 
… 
1 
0 
0 
1 
1 
0 
0 
1 
0 
0 
0 
… 
ProductID, COUNT(*)) 
(1, 3) 
(2, 1) 
(3, 2) 
2 
… 
0 
0 
1 
0 
0 
0 
1 
0 
0 
1 
0 
… 
3 
… 
0 
1 
0 
0 
0 
1 
0 
0 
1 
0 
1 
… 
(Q1, 1, 300) 
(Q2, 301, 6) 
(Q3, 307, 500) 
(Q4, 807, 600) 
301-306 
Index Lookup + Offset jump 
SELECT ProductID, Count(*) 
FROM table 
WHERE (Quarter = Q2) 
GROUP BY ProductID 
VLDB 2009 Tutorial Column-Oriented Database Systems 59
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Integrating Compression and Execution in Column- 
Oriented Database Systems” Abadi et. al, SIGMOD ’06 
Operating Directly on Compressed Data 
SELECT ProductID, Count(*) 
FROM table 
WHERE (Quarter = Q2) 
GROUP BY ProductID 
(Q1, 1, 300) 
(Q2, 301, 6) 
(Q3, 307, 500) 
(Q4, 807, 600) 
Block API 
Data 
isOneValue() 
isValueSorted() 
isPosContiguous() 
isSparse() 
getNext() 
decompressIntoArray() 
getValueAtPosition(pos) 
getMin() 
getMax() 
getSize() 
Aggregation 
Operator 
Selection 
Operator 
Compression- 
Aware Scan 
Operator 
VLDB 2009 Tutorial Column-Oriented Database Systems 60
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Column-Oriented 
Database Systems 
Tuple Materialization and 
Column-Oriented Join Algorithms 
VLDB 
2009 
Tutorial 
“Materialization Strategies in a Column- 
Oriented DBMS” Abadi, Myers, DeWitt, 
and Madden. ICDE 2007. 
“Query Processing Techniques for 
Solid State Drives” Tsirogiannis, 
Harizopoulos Shah, Wiener, and 
Graefe. SIGMOD 2009. 
“Column-Stores vs Row-Stores: How 
Different are They Really?” Abadi, 
Madden, and Hachem. SIGMOD 2008. 
“Cache-Conscious Radix-Decluster 
Projections”, Manegold, Boncz, Nes, 
VLDB’04 
“Self-organizing tuple reconstruction in 
column-stores“, Idreos, Manegold, 
Kersten, SIGMOD’09
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
When should columns be projected? 
l Where should column projection operators be placed in a query 
plan? 
l Row-store: 
l Column projection involves removing unneeded columns from 
tuples 
l Generally done as early as possible 
l Column-store: 
l Operation is almost completely opposite from a row-store 
l Column projection involves reading needed columns from storage 
and extracting values for a listed set of tuples 
§ This process is called “materialization” 
l Early materialization: project columns at beginning of query plan 
§ Straightforward since there is a one-to-one mapping across columns 
l Late materialization: wait as long as possible for projecting columns 
§ More complicated since selection and join operators on one column obfuscates 
mapping to other columns from same table 
l Most column-stores construct tuples and column projection time 
§ Many database interfaces expect output in regular tuples (rows) 
§ Rest of discussion will focus on this case 
VLDB 2009 Tutorial Column-Oriented Database Systems 62
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
When should tuples be constructed? 
Select + Aggregate 
QUERY: 
SELECT custID,SUM(price) 
FROM table 
WHERE (prodID = 4) AND 
(storeID = 1) AND 
GROUP BY custID 
l Solution 1: Create rows first (EM). 
But: 
l Need to construct ALL tuples 
l Need to decompress data 
l Poor memory bandwidth 
utilization 
2 
3 
3 
3 
7 
13 
42 
80 
2 
1 
3 
1 
Construct 
2 
1 
3 
1 
2 
3 
3 
3 
7 
13 
42 
80 
(4,1,4) 
4 
4 
4 
4 
prodID storeID custID price 
VLDB 2009 Tutorial Column-Oriented Database Systems 63
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Solution 2: Operate on columns 
AGG 
AND 
Data 
Source 
custID 
Data 
Source 
prodID 
Data 
Source 
price 
Data 
Source 
storeID 
1 
1 
1 
1 
Data 
Source 
4 
4 
4 
4 
0 
1 
0 
1 
Data 
Source 
2 
1 
3 
1 
prodID storeID 
QUERY: 
SELECT custID,SUM(price) 
FROM table 
WHERE (prodID = 4) AND 
(storeID = 1) AND 
GROUP BY custID 
2 
1 
3 
1 
2 
3 
3 
3 
7 
13 
42 
80 
4 
4 
4 
4 
prodID storeID custID price 
VLDB 2009 Tutorial Column-Oriented Database Systems 64
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Solution 2: Operate on columns 
1 
1 
1 
1 
0 
1 
0 
1 
AND 
0 
1 
0 
1 
QUERY: 
SELECT custID,SUM(price) 
FROM table 
WHERE (prodID = 4) AND 
(storeID = 1) AND 
GROUP BY custID 
AGG 
AND 
Data 
Source 
custID 
Data 
Source 
price 
2 
1 
3 
1 
2 
3 
3 
3 
7 
13 
42 
80 
4 
4 
4 
4 
prodID storeID custID price 
Data 
Source 
prodID 
Data 
Source 
storeID 
VLDB 2009 Tutorial Column-Oriented Database Systems 65
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Solution 2: Operate on columns 
3 
3 
custID price 
0 
1 
0 
1 
Data 
Source 
1 
13 
1 
80 
Data 
Source 
7 
0 
1 
13 
0 
42 
1 
80 
0 
2 
3 
1 
3 
0 
1 
3 
QUERY: 
SELECT custID,SUM(price) 
FROM table 
WHERE (prodID = 4) AND 
(storeID = 1) AND 
GROUP BY custID 
AGG 
AND 
Data 
Source 
custID 
Data 
Source 
price 
2 
1 
3 
1 
2 
3 
3 
3 
7 
13 
42 
80 
4 
4 
4 
4 
prodID storeID custID price 
Data 
Source 
prodID 
Data 
Source 
storeID 
VLDB 2009 Tutorial Column-Oriented Database Systems 66
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Solution 2: Operate on columns 
1 
3 
1 
3 
1 
13 
1 
80 
13 1 93 
AGG 
QUERY: 
SELECT custID,SUM(price) 
FROM table 
WHERE (prodID = 4) AND 
(storeID = 1) AND 
GROUP BY custID 
AGG 
AND 
Data 
Source 
custID 
Data 
Source 
price 
2 
1 
3 
1 
2 
3 
3 
3 
7 
13 
42 
80 
4 
4 
4 
4 
prodID storeID custID price 
Data 
Source 
prodID 
Data 
Source 
storeID 
VLDB 2009 Tutorial Column-Oriented Database Systems 67
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Materialization Strategies in a Column-Oriented DBMS” 
Abadi, Myers, DeWitt, and Madden. ICDE 2007. 
For plans without joins, late materialization 
is a win 
QUERY: 
SELECT C1, SUM(C2) 
FROM table 
WHERE (C1 < CONST) AND 
(C2 < CONST) 
GROUP BY C1 
l Ran on 2 compressed 
columns from TPC-H 
scale 10 data 
10 
9 
8 
7 
6 
5 
4 
3 
2 
1 
0 
Low selectivity Medium 
selectivity 
High selectivity 
VLDB 2009 Tutorial Column-Oriented Database Systems 68 
Time (seconds) 
Early Materialization Late Materialization
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Materialization Strategies in a Column-Oriented DBMS” 
Abadi, Myers, DeWitt, and Madden. ICDE 2007. 
Even on uncompressed data, late 
materialization is still a win 
QUERY: 
SELECT C1, SUM(C2) 
FROM table 
WHERE (C1 < CONST) AND 
(C2 < CONST) 
GROUP BY C1 
l Materializing late still 
works best 
14 
12 
10 
8 
6 
4 
2 
0 
Low selectivity Medium 
selectivity 
High selectivity 
Time (seconds) 
Early Materialization Late Materialization 
VLDB 2009 Tutorial Column-Oriented Database Systems 69
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
What about for plans with joins? 
B, C, E, H, F 
B, C, E, G, H 
Join 1 
A = D 
Join 2 
G = K 
A, B, C 
D, E, G, H 
Scan Scan 
Scan 
R3 (F, K, L) 
B, C F 
G K 
Project 
Join 
A = D 
B, C, E, H, F 
Project 
A D 
Join 
G = K 
Scan Scan 
Scan 
R3 (F, K, L) 
G 
VLDB 2009 Tutorial Column-Oriented Database Systems 70 
70 
Select R1.B, R1.C, R2.E, R2.H, R3.F 
From R1, R2, R3 
Where R1.A = R2.D AND R2.G = R3.K 
R1 (A, B, C) R2 (D, E, G, H) 
F, K 
R1 (A, B, C) R2 (D, E, G, H) 
E, H
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
What about for plans with joins? 
B, C, E, H, F 
B, C, E, G, H 
Join 1 
A = D 
Join 2 
G = K 
A, B, C 
D, E, G, H 
Scan Scan 
Scan 
R3 (F, K, L) 
B, C F 
G K 
Project 
Join 
A = D 
B, C, E, H, F 
Project 
A D 
Join 
G = K 
Scan Scan 
Scan 
R3 (F, K, L) 
G 
VLDB 2009 Tutorial Column-Oriented Database Systems 71 
71 
Select R1.B, R1.C, R2.E, R2.H, R3.F 
From R1, R2, R3 
Where R1.A = R2.D AND R2.G = R3.K 
R1 (A, B, C) R2 (D, E, G, H) 
F, K 
R1 (A, B, C) R2 (D, E, G, H) 
E, H 
Early 
materialization 
Late 
materialization
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Early Materialization Example 
12 
1 
11 
1 
2 
3 
3 
3 
7 
13 
42 
80 
Construct 
6 
1 
2 
1 
2 
3 
3 
3 
7 
13 
42 
80 
(4,1,4) 
prodID storeID quantity custID price 
1 
2 
3 
Green 
White 
Brown 
Construct 
1 
2 
3 
Green 
White 
Brown 
custID lastName 
QUERY: 
SELECT C.lastName,SUM(F.price) 
FROM facts AS F, customers AS C 
WHERE F.custID = C.custID 
GROUP BY C.lastName 
Facts Customers 
VLDB 2009 Tutorial Column-Oriented Database Systems 72
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Early Materialization Example 
2 
3 
3 
3 
7 
13 
42 
80 
White 
Brown 
Brown 
Brown 
1 
2 
3 
Green 
White 
Brown 
QUERY: 
SELECT C.lastName,SUM(F.price) 
FROM facts AS F, customers AS C 
WHERE F.custID = C.custID 
GROUP BY C.lastName 
7 
13 
42 
80 
Join 
VLDB 2009 Tutorial Column-Oriented Database Systems 73
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Late Materialization Example 
12 
1 
11 
1 
6 
1 
2 
1 
2 
3 
1 
3 
Join 
1 
2 
3 
4 
7 
13 
42 
80 
(4,1,4) 
prodID storeID quantity custID price 
1 
2 
3 
Green 
White 
Brown 
2 
3 
1 
3 
custID lastName 
QUERY: 
SELECT C.lastName,SUM(F.price) 
FROM facts AS F, customers AS C 
WHERE F.custID = C.custID 
GROUP BY C.lastName 
Facts Customers 
Late materialized join 
causes out of order 
probing of projected 
columns from the 
inner relation 
VLDB 2009 Tutorial Column-Oriented Database Systems 74
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Late Materialized Join Performance 
l Naïve LM join about 2X slower than EM join on typical 
queries (due to random I/O) 
l This number is very dependent on 
l Amount of memory available 
l Number of projected attributes 
l Join cardinality 
l But we can do better 
l Invisible Join 
l Jive/Flash Join 
l Radix cluster/decluster join 
VLDB 2009 Tutorial Column-Oriented Database Systems 75
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Invisible Join 
“Column-Stores vs Row-Stores: How 
Different are They Really?” Abadi, 
Madden, and Hachem. SIGMOD 2008. 
l Designed for typical joins when data is modeled using a star 
schema 
l One (“fact”) table is joined with multiple dimension tables 
l Typical query: 
select c_nation, s_nation, d_year, 
sum(lo_revenue) as revenue 
from customer, lineorder, supplier, date 
where lo_custkey = c_custkey 
and lo_suppkey = s_suppkey 
and lo_orderdate = d_datekey 
and c_region = 'ASIA‘ 
and s_region = 'ASIA‘ 
and d_year >= 1992 and d_year <= 1997 
group by c_nation, s_nation, d_year 
order by d_year asc, revenue desc; 
VLDB 2009 Tutorial Column-Oriented Database Systems 76
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Invisible Join 
“Column-Stores vs Row-Stores: How 
Different are They Really?” Abadi, 
Madden, and Hachem. SIGMOD 2008. 
Apply “region = ‘Asia’” On Customer Table 
custkey region nation … 
1 ASIA CHINA … 
2 ASIA INDIA … 
3 ASIA INDIA … 
Hash Table (or bit-map) 
Containing Keys 1, 2 and 3 
4 EUROPE FRANCE … 
Apply “region = ‘Asia’” On Supplier Table 
suppkey region nation … 
1 ASIA RUSSIA … 
2 EUROPE SPAIN … 
Hash Table (or bit-map) 
Containing Keys 1, 3 
3 ASIA JAPAN … 
Apply “year in [1992,1997]” On Date Table 
dateid year … 
01011997 1997 … 
01021997 1997 … 
01031997 1997 … 
Hash Table Containing 
Keys 01011997, 01021997, 
and 01031997 
VLDB 2009 Tutorial Column-Oriented Database Systems 77
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Column-Stores vs Row-Stores: 
How Different are They Really?” 
Abadi et. al. SIGMOD 2008. 
Original Fact Table 
custkey suppkey orderkey orderdate revenue 
1 3 1 01011997 43256 
2 3 2 01011997 33333 
3 4 3 01021997 12121 
4 1 1 01021997 23233 
5 4 2 01021997 45456 
6 1 2 01031997 43251 
7 3 2 01031997 34235 
Hash Table 
Containing 
Keys 1, 2 and 3 
+ 
custkey 
3 
3 
4 
1 
4 
1 
3 
1 
1 
0 
1 
0 
1 
1 
Hash Table 
Containing 
Keys 1 and 3 
1 
0 
1 
1 
0 
0 
0 
+ suppkey 
1 
2 
3 
1 
2 
2 
2 
1 
0 
1 
1 
0 
0 
0 
1 
1 
1 
1 
1 
1 
1 
& & = 
Hash Table Containing 
Keys 01011997, 
01021997, and 01031997 
1 
0 
0 
1 
0 
0 
0 
1 
1 
1 
1 
1 
1 
1 
+ orderdate 
01011997 
01011997 
01021997 
01021997 
01021997 
01031997 
01031997 
1 
1 
0 
1 
0 
1 
1 
VLDB 2009 Tutorial Column-Oriented Database Systems 78
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
custkey 
3 
3 
4 
1 
4 
1 
3 
+ 
suppkey 
1 
2 
3 
1 
2 
2 
2 
orderdate 
01011997 
01011997 
01021997 
01021997 
01021997 
01031997 
01031997 
1 
0 
0 
1 
0 
0 
0 
3 
1 
1 
0 
0 
1 
0 
0 
0 
1 
1 
1 
0 
0 
1 
0 
0 
0 
01011997 
01021997 
+ 
+ 
+ 
+ 
FRANCE 
RUSSIA 
SPAIN = RUSSIA 
JOIN 
CHINA 
INDIA 
INDIA 
= CHINA 
INDIA RUSSIA 
JAPAN 
01011997 
1997 
01021997 
1997 
01031997 
1997 = 1997 
1997 
VLDB 2009 Tutorial Column-Oriented Database Systems 79
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Invisible Join 
“Column-Stores vs Row-Stores: How 
Different are They Really?” Abadi, 
Madden, and Hachem. SIGMOD 2008. 
Apply “region = ‘Asia’” On Customer Table 
custkey region nation … 
1 ASIA CHINA … 
2 ASIA INDIA … 
3 ASIA INDIA … 
Hash Table (or bit-map) 
Containing Keys 1, 2 and 3 
4 EUROPE FRANCE … 
Apply “region = ‘Asia’” On Supplier Table 
suppkey region nation … 
1 ASIA RUSSIA … 
2 EUROPE SPAIN … 
Range [1-3] 
(between-predicate rewriting) 
Hash Table (or bit-map) 
Containing Keys 1, 3 
3 ASIA JAPAN … 
Apply “year in [1992,1997]” On Date Table 
dateid year … 
01011997 1997 … 
01021997 1997 … 
01031997 1997 … 
Hash Table Containing 
Keys 01011997, 01021997, 
and 01031997 
VLDB 2009 Tutorial Column-Oriented Database Systems 80
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Invisible Join 
l Bottom Line 
l Many data warehouses model data using star/snowflake 
schemes 
l Joins of one (fact) table with many dimension tables is 
common 
l Invisible join takes advantage of this by making sure 
that the table that can be accessed in position order is 
the fact table for each join 
l Position lists from the fact table are then intersected (in 
position order) 
l This reduces the amount of data that must be accessed 
out of order from the dimension tables 
l “Between-predicate rewriting” trick not relevant for this 
discussion 
VLDB 2009 Tutorial Column-Oriented Database Systems 81
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
custkey 
3 
3 
4 
1 
4 
1 
3 
+ 
suppkey 
1 
2 
3 
1 
2 
2 
2 
orderdate 
01011997 
01011997 
01021997 
01021997 
01021997 
01031997 
01031997 
1 
0 
0 
1 
0 
0 
0 
3 
1 
1 
0 
0 
1 
0 
0 
0 
1 
1 
1 
0 
0 
1 
0 
0 
0 
01011997 
01021997 
+ 
+ 
+ 
+ 
FRANCE 
Still accessing table out of order 
RUSSIA 
SPAIN = RUSSIA 
JOIN 
CHINA 
INDIA 
INDIA 
= CHINA 
INDIA RUSSIA 
JAPAN 
01011997 
1997 
01021997 
1997 
01031997 
1997 = 1997 
1997 
VLDB 2009 Tutorial Column-Oriented Database Systems 82
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Jive/Flash Join 
3 
1 + CHINA 
INDIA 
INDIA 
= CHINA 
INDIA FRANCE 
Still accessing table out of order 
“Fast Joins using Join 
Indices”. Li and Ross, 
VLDBJ 8:1-24, 1999. 
“Query Processing 
Techniques for Solid State 
Drives”. Tsirogiannis, 
Harizopoulos et. al. 
SIGMOD 2009. 
VLDB 2009 Tutorial Column-Oriented Database Systems 83
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Jive/Flash Join 
1. Add column with dense 
ascending integers 
from 1 
2. Sort new position list 
by second column 
3. Probe projected 
column in order using 
new sorted position 
list, keeping first 
column from position 
list around 
4. Sort new result by first 
column 
3 
1 + CHINA 
INDIA 
INDIA 
= CHINA 
INDIA FRANCE 
1 
2 
2 
1 
CHINA 
1 3 
+ INDIA 
INDIA = INDIA 
CHINA 
FRANCE 
2 
1 
1 INDIA 
2 
CHINA 
VLDB 2009 Tutorial Column-Oriented Database Systems 84
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Jive/Flash Join 
l Bottom Line 
l Instead of probing projected columns from inner table out of 
order: 
l Sort join index 
l Probe projected columns in order 
l Sort result using an added column 
l LM vs EM tradeoffs: 
l LM has the extra sorts (EM accesses all columns in order) 
l LM only has to fit join columns into memory (EM needs join 
columns and all projected columns) 
§ Results in big memory and CPU savings (see part 3 for why there is 
CPU savings) 
l LM only has to materialize relevant columns 
l In many cases LM advantages outweigh disadvantages 
l LM would be a clear winner if not for those pesky sorts … can 
we do better? 
VLDB 2009 Tutorial Column-Oriented Database Systems 85
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Radix Cluster/Decluster 
l The full sort from the Jive join is actually overkill 
l We just want to access the storage blocks in order (we 
don’t mind random access within a block) 
l So do a radix sort and stop early 
l By stopping early, data within each block is accessed out of 
order, but in the order specified in the original join index 
l Use this pseudo-order to accelerate the post-probe sort as well 
“Cache-Conscious Radix-Decluster 
Projections”, Manegold, Boncz, Nes, 
VLDB’04 
•“Database Architecture Optimized for the 
New Bottleneck: Memory Access” 
VLDB’99 
•“Generic Database Cost Models for 
Hierarchical Memory Systems”, VLDB’02 
(all Manegold, Boncz, Kersten) 
VLDB 2009 Tutorial Column-Oriented Database Systems 86
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Radix Cluster/Decluster 
l Bottom line 
l Both sorts from the Jive join can be significantly reduced in 
overhead 
l Only been tested when there is sufficient memory for the 
entire join index to be stored three times 
l Technique is likely applicable to larger join indexes, but utility will 
go down a little 
l Only works if random access within a storage block 
l Don’t want to use radix cluster/decluster if you have variable-width 
column values or compression schemes that can only be 
decompressed starting from the beginning of the block 
VLDB 2009 Tutorial Column-Oriented Database Systems 87
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
LM vs EM joins 
l Invisible, Jive, Flash, Cluster, Decluster techniques 
contain a bag of tricks to improve LM joins 
l Research papers show that LM joins become 2X faster 
than EM joins (instead of 2X slower) for a wide array of 
query types 
VLDB 2009 Tutorial Column-Oriented Database Systems 88
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Tuple Construction Heuristics 
l For queries with selective predicates, 
aggregations, or compressed data, use late 
materialization 
l For joins: 
l Research papers: 
l Always use late materialization 
l Commercial systems: 
l Inner table to a join often materialized before join 
(reduces system complexity): 
l Some systems will use LM only if columns from 
inner table can fit entirely in memory 
VLDB 2009 Tutorial Column-Oriented Database Systems 89
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Outline 
l Computational Efficiency of DB on modern hardware 
l how column-stores can help here 
l Keynote revisited: MonetDB & VectorWise in more depth 
l CPU efficient column compression 
l vectorized decompression 
l Conclusions 
l future work 
VLDB 2009 Tutorial Column-Oriented Database Systems 90
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Column-Oriented 
Database Systems 
40 years of hardware evolution 
vs. 
DBMS computational efficiency 
VLDB 
2009 
Tutorial
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
CPU Architecture 
Elements: 
l Storage 
l CPU caches L1/L2/L3 
l Registers 
l Execution Unit(s) 
l Pipelined 
l SIMD 
VLDB 2009 Tutorial Column-Oriented Database Systems 92
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
CPU Metrics 
VLDB 2009 Tutorial Column-Oriented Database Systems 93
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
CPU Metrics 
VLDB 2009 Tutorial Column-Oriented Database Systems 94
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
DRAM Metrics 
VLDB 2009 Tutorial Column-Oriented Database Systems 95
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Super-Scalar Execution (pipelining) 
VLDB 2009 Tutorial Column-Oriented Database Systems 96
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Hazards 
l Data hazards 
l Dependencies between instructions 
l L1 data cache misses 
Result: bubbles in the pipeline 
l Control Hazards 
l Branch mispredictions 
l Computed branches (late binding) 
l L1 instruction cache misses 
Out-of-order execution addresses data hazards 
l control hazards typically more expensive 
VLDB 2009 Tutorial Column-Oriented Database Systems 97
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
SIMD 
l Single Instruction Multiple Data 
l Same operation applied on a vector of values 
l MMX: 64 bits, SSE: 128bits, AVX: 256bits 
l SSE, e.g. multiply 8 short integers 
VLDB 2009 Tutorial Column-Oriented Database Systems 98
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
A Look at the Query Pipeline 
350 
377 *– 5300 
7 
PROJECT 
2327 > 30 ? 
FTARLUSEE 
SELECT 
SCAN 
102 ivan 350 
102 ivan 37 
102 101 ivan alice 22 
37 
102 ivan 37 
101 alice 22 
next() 
next() 
next() 
SELECT id, name 
(age-30)*50 AS bonus 
FROM employee 
WHERE age > 30 
VLDB 2009 Tutorial Column-Oriented Database Systems 99
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
A Look at the Query Pipeline 
377 *– 5300 
PROJECT 
2327 > 30 ? 
SELECT 
SCAN 
next() 
next() 
next() 
102 ivan 350 
Operators 
Iterator interface 
-open() 
-next(): tuple 
-close() 
VLDB 2009 Tutorial Column-Oriented Database Systems 100
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
A Look at the Query Pipeline 
350 
377 *– 5300 
7 
PROJECT 
2327 > 30 ? 
FTARLUSEE 
SELECT 
SCAN 
102 ivan 350 
102 ivan 37 
102 101 ivan alice 22 
37 
102 ivan 37 
101 alice 22 
next() 
next() 
next() 
Primitives 
Provide computational 
functionality 
All arithmetic allowed in 
expressions, 
e.g. Multiplication 
7 * 50 
mult(int,int) Ł int 
VLDB 2009 Tutorial Column-Oriented Database Systems 101
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Database Architecture causes Hazards 
l Control Hazards 
l Branch mispredictions 
l Computed branches (late binding) 
l L1 instruction cache misses 
Operators 
Iterator interface 
-open() 
-next(): tuple 
l Dependencies between instructions 
SIMD Out-of-order Execution 
work on one tuple at a time 
PROJECT 
SELECT 
SCAN -close() 
l Data hazards 
l L1 data cache misses 
“DBMSs On A Modern Processor: Where Does Time Go? ” 
Ailamaki, DeWitt, Hill, Wood, VLDB’99 
Complex NSM record navigation 
Large Tree/Hash Structures 
Code footprint of all 
operators in query plan 
exceeds L1 cache 
Data-dependent conditions 
Next() late binding 
method calls 
Tree, List, Hash traversal 
VLDB 2009 Tutorial Column-Oriented Database Systems 102
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
DBMS Computational Efficiency 
TPC-H 1GB, query 1 
l selects 98% of fact table, computes net prices and 
aggregates all 
l Results: 
l C program: ? 
l MySQL: 26.2s 
l DBMS “X”: 28.1s 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
VLDB 2009 Tutorial Column-Oriented Database Systems 103
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
DBMS Computational Efficiency 
TPC-H 1GB, query 1 
l selects 98% of fact table, computes net prices and 
aggregates all 
l Results: 
l C program: 0.2s 
l MySQL: 26.2s 
l DBMS “X”: 28.1s 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
VLDB 2009 Tutorial Column-Oriented Database Systems 104
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Column-Oriented 
Database Systems 
MonetDB 
VLDB 
2009 
Tutorial
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
a column-store 
l “save disk I/O when scan-intensive queries need a few 
columns” 
l “avoid an expression interpreter to improve 
computational efficiency” 
VLDB 2009 Tutorial Column-Oriented Database Systems 106
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
RISC Database Algebra 
SELECT id, name, (age-30)*50 as bonus 
FROM people 
WHERE age > 30 
VLDB 2009 Tutorial Column-Oriented Database Systems 107
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
RISC Database Algebra 
CPU happy? Give it “nice” code ! 
- few dependencies (control,data) 
- CPU gets out-of-order execution 
- compiler can e.g. generate SIMD 
One loop for an entire column 
- no per-tuple interpretation 
- arrays: no record navigation 
- better instruction cache locality 
batcalc_minus_int(int* res, 
int* col, 
int val, 
int n) 
SELECT id, name, (age-30)*50 as bonus 
FROM people 
WHERE age > 30 
VLDB 2009 Tutorial Column-Oriented Database Systems 108 
{ 
for(i=0; i<n; i++) 
res[i] = col[i] - val; 
} 
Simple, hard-coded 
semantics 
in operators
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
RISC Database Algebra 
CPU happy? Give it “nice” code ! 
- few dependencies (control,data) 
- CPU gets out-of-order execution 
- compiler can e.g. generate SIMD 
One loop for an entire column 
- no per-tuple interpretation 
- arrays: no record navigation 
- better instruction cache locality 
batcalc_minus_int(int* res, 
int* col, 
int val, 
int n) 
SELECT id, name, (age-30)*50 as bonus 
FROM people 
WHERE age > 30 
VLDB 2009 Tutorial Column-Oriented Database Systems 109 
{ 
for(i=0; i<n; i++) 
res[i] = col[i] - val; 
} 
MATERIALIZED 
intermediate 
results
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
a column-store 
l “save disk I/O when scan-intensive queries need a few columns” 
l “avoid an expression interpreter to improve computational 
efficiency” 
How? 
l RISC query algebra: hard-coded semantics 
l Decompose complex expressions in multiple operations 
l Operators only handle simple arrays 
l No code that handles slotted buffered record layout 
l Relational algebra becomes array manipulation language 
l Often SIMD for free 
l Plus: use of cache-conscious algorithms for Sort/Aggr/Join 
VLDB 2009 Tutorial Column-Oriented Database Systems 110
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
a Faustian pact 
l You want efficiency 
l Simple hard-coded operators 
l I take scalability 
l Result materialization 
n C program: 0.2s 
n MonetDB: 3.7s 
n MySQL: 26.2s 
n DBMS “X”: 28.1s 
VLDB 2009 Tutorial Column-Oriented Database Systems 111
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Column-Oriented 
Database Systems 
as a research platform 
VLDB 
2009 
Tutorial
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
SIGMOD 1985 
MonetDB supports 
SQL, XML, ODMG, 
..RDF 
RDF support on C-STORE 
/ SW-Store 
MonetDB 
BAT Algebra 
•“MIL Primitives for Querying a Fragmented World”, 
Boncz, Kersten, VLDBJ’98 
•“Flattening an Object Algebra to Provide Performance” 
Boncz, Wilschut, Kersten, ICDE’98 
•“MonetDB/XQuery: a fast XQuery processor powered 
by a relational engine” Boncz, Grust, vanKeulen, 
Rittinger, Teubner, SIGMOD’06 
•“SW-Store: a vertically partitioned DBMS for Semantic 
Web data management“ Abadi, Marcus, Madden, 
Hollenbach, VLDBJ’09 
VLDB 2009 Tutorial Column-Oriented Database Systems 113
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
The Software Stack 
Extensible query lang 
frontend framework 
Exte.n. sSiGbLle? Dynamic 
Runtime QOPT 
Framework! 
XQuery 
SQL 03 
Optimizers 
MonetDB 4 MonetDB 5 
MonetDB kernel 
RDF 
SOAP 
Arrays 
OGIS 
compile 
Extensible 
Architecture-Conscious 
Execution platform 
VLDB 2009 Tutorial Column-Oriented Database Systems 114
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
optimizer frontend backend 
as a research platform 
l Cache-Conscious Joins 
l Cost Models, Radix-cluster, 
Radix-decluster 
l MonetDB/XQuery: 
l structural joins exploiting 
positional column access 
l Cracking: 
•“Database Architecture Optimized for the New 
Bottleneck: Memory Access” VLDB’99 
•“Generic Database Cost Models for Hierarchical Memory 
Systems”, VLDB’02 (all Manegold, Boncz, Kersten) 
•“Cache-Conscious Radix-Decluster Projections”, 
Manegold, Boncz, Nes, VLDB’04 
l on-the-fly automatic indexing 
without workload knowledge 
l Recycling: 
“MonetDB/XQuery: a fast XQuery processor powered 
by a relational engine” Boncz, Grust, vanKeulen, 
Rittinger, Teubner, SIGMOD’06 
“Database Cracking”, CIDR’07 
“Updating a cracked database “, SIGMOD’07 
“Self-organizing tuple reconstruction in column-stores“, 
l using materialized intermediates 
l Run-time Query Optimization: 
l correlation-aware run-time 
optimization without cost model 
SIGMOD’09 (all Idreos, Manegold, Kersten) 
“An architecture for recycling intermediates in a 
column-store”, Ivanova, Kersten, Nes, Goncalves, 
SIGMOD’09 
“ROX: run-time optimization of XQueries”, 
Abdelkader, Boncz, Manegold, vanKeulen, 
SIGMOD’09 
VLDB 2009 Tutorial Column-Oriented Database Systems 115
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Column-Oriented 
Database Systems 
“MonetDB/X100” 
vectorized query processing 
VLDB 
2009 
Tutorial
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
MonetDB spin-off: MonetDB/X100 
Materialization vs Pipelining 
VLDB 2009 Tutorial Column-Oriented Database Systems 117
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
PROJECT 
SELECT 
SCAN 
next() 
next() 
next() 
101 
102 
104 
105 
alice 
ivan 
peggy 
victor 
22 
37 
45 
25 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
VLDB 2009 Tutorial Column-Oriented Database Systems 118
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
PROJECT 
> 30 ? 
FALSE 
TRUE 
TRUE 
FALSE 
SELECT 
SCAN 
next() 
next() 
next() 
101 
102 
104 
105 
101 
102 
104 
105 
alice 
ivan 
peggy 
victor 
alice 
ivan 
peggy 
victor 
22 
37 
45 
25 
22 
37 
45 
25 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
VLDB 2009 Tutorial Column-Oriented Database Systems 119
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
- 30 * 50 
350 
750 
PROJECT 
> 30 ? 
FALSE 
TRUE 
TRUE 
FALSE 
SELECT 
SCAN 
next() 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
CPU cache Resident next() 
next() 
102 
104 
102 
104 
101 
102 
104 
105 
101 
102 
104 
105 
ivan 
peggy 
ivan 
peggy 
alice 
ivan 
peggy 
victor 
alice 
ivan 
peggy 
victor 
350 
750 
37 
45 
22 
37 
45 
25 
22 
37 
45 
25 
7 
15 
“Vectorized In Cache 
Processing” 
Observations: 
next() called much less often Ł 
more vector time = spent array in of ~100 
primitives 
less in overhead 
processed in a tight loop 
primitive calls process an array of 
values in a loop: 
VLDB 2009 Tutorial Column-Oriented Database Systems 120
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
- 30 * 50 
350 
750 
PROJECT 
> 30 ? 
FALSE 
TRUE 
TRUE 
FALSE 
SELECT 
SCAN 
next() 
next() 
next() 
102 
104 
102 
104 
101 
102 
104 
105 
101 
102 
104 
105 
ivan 
peggy 
ivan 
peggy 
alice 
ivan 
peggy 
victor 
alice 
ivan 
peggy 
victor 
350 
750 
37 
45 
22 
37 
45 
25 
22 
37 
45 
25 
7 
15 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
Observations: 
next() called much less often Ł 
more time spent in primitives 
less in overhead 
primitive calls process an array of 
values in a loop: 
CPU Efficiency depends on “nice” code 
- out-of-order execution 
- few dependencies (control,data) 
- compiler support 
Compilers like simple loops over arrays 
- loop-pipelining 
- automatic SIMD 
VLDB 2009 Tutorial Column-Oriented Database Systems 121
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
for(i=0; i<n; i++) 
res[i] = (col[i] > x) 
* 50 
350 
750 
PROJECT 
for(i=0; i<n; i++) 
> 30 ? 
FALSE 
TRUE 
TRUE 
FALSE 
res[i] = (col[i] - x) 
SELECT 
for(i=0; i<n; i++) 
res[i] = (col[i] * x) 
SCAN 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
Observations: 
next() called much less often Ł 
more time spent in primitives 
less in overhead 
primitive calls process an array of 
values in a loop: 
CPU Efficiency depends on “nice” code 
- out-of-order execution 
- few dependencies (control,data) 
- compiler support 
Compilers like simple loops over arrays 
- loop-pipelining 
- automatic SIMD 
> 30 ? 
FALSE 
TRUE 
TRUE 
FALSE 
- 30 
7 
15 
* 50 
350 
750 
VLDB 2009 Tutorial Column-Oriented Database Systems 122
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
PROJECT 
> 30 ? 
SELECT 
SCAN 
next() 
next() 
101 
102 
104 
105 
alice 
ivan 
peggy 
victor 
22 
37 
45 
25 
1 
2 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
Tricks being played: 
- Late materialization 
- Materialization avoidance 
using selection vectors 
VLDB 2009 Tutorial Column-Oriented Database Systems 123
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
- 30 * 50 
350 
750 
PROJECT 
> 30 ? 
SELECT 
SCAN 
next() 
next() 
next() 
101 
102 
104 
105 
alice 
ivan 
peggy 
victor 
22 
37 
45 
25 
7 
15 
1 
2 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
map_mul_flt_val_flt_col( 
float *res, 
int* sel, 
float val, 
float *col, int n) 
Tricks being played: 
- Late materialization 
- Materialization avoidance 
using selection vectors 
VLDB 2009 Tutorial Column-Oriented Database Systems 124 
{ 
for(int i=0; i<n; i++) 
res[i] = val * col[sel[i]]; 
} 
selection vectors used to reduce 
vector copying 
contain selected positions
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
- 30 * 50 
350 
750 
PROJECT 
> 30 ? 
SELECT 
SCAN 
next() 
next() 
next() 
101 
102 
104 
105 
alice 
ivan 
peggy 
victor 
22 
37 
45 
25 
7 
15 
1 
2 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
map_mul_flt_val_flt_col( 
float *res, 
int* sel, 
float val, 
float *col, int n) 
Tricks being played: 
- Late materialization 
- Materialization avoidance 
using selection vectors 
VLDB 2009 Tutorial Column-Oriented Database Systems 125 
{ 
for(int i=0; i<n; i++) 
res[i] = val * col[sel[i]]; 
} 
selection vectors used to reduce 
vector copying 
contain selected positions
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
MonetDB/X100 
l Both efficiency 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
l Vectorized primitives 
l and scalability.. 
l Pipelined query evaluation 
n C program: 0.2s 
n MonetDB/X100: 0.6s 
n MonetDB: 3.7s 
n MySQL: 26.2s 
n DBMS “X”: 28.1s 
VLDB 2009 Tutorial Column-Oriented Database Systems 126
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Memory Hierarchy 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
X100 query engine 
ColumnBM 
(buffer manager) 
CPU 
cache 
(raid) 
Disk(s) 
RAM 
VLDB 2009 Tutorial Column-Oriented Database Systems 127
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Memory Hierarchy 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
Vectors are only the in-cache 
representation 
RAM & disk representation might 
actually be different 
(vectorwise uses both PAX & DSM) 
X100 query engine 
ColumnBM 
(buffer manager) 
CPU 
cache 
(raid) 
Disk(s) 
RAM 
VLDB 2009 Tutorial Column-Oriented Database Systems 128
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Optimal Vector size? 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
All vectors together should fit 
the CPU cache 
Optimizer should tune this, 
given the query characteristics. 
X100 query engine 
ColumnBM 
(buffer manager) 
CPU 
cache 
(raid) 
Disk(s) 
RAM 
VLDB 2009 Tutorial Column-Oriented Database Systems 129
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
Varying the Vector size 
Less and less iterator.next() 
and 
primitive function calls 
(“interpretation overhead”) 
VLDB 2009 Tutorial Column-Oriented Database Systems 130
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
Varying the Vector size 
Vectors start to exceed the 
CPU cache, causing 
additional memory traffic 
VLDB 2009 Tutorial Column-Oriented Database Systems 131
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
Varying the Vector size 
The benefit of selection 
vectors 
VLDB 2009 Tutorial Column-Oriented Database Systems 132
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
MonetDB/MIL materializes columns 
ColumnBM 
(buffer manager) 
CPU 
cache 
(raid) 
Disk(s) 
RAM 
VLDB 2009 Tutorial Column-Oriented Database Systems 133
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“MonetDB/X100: Hyper-Pipelining Query 
Execution ” Boncz, Zukowski, Nes, CIDR’05 
Benefits of Vectorized Processing 
l 100x less Function Calls 
l iterator.next(), primitives 
l No Instruction Cache Misses 
l High locality in the primitives 
l Less Data Cache Misses 
“Buffering Database Operations for 
Enhanced Instruction Cache Performance” 
Zhou, Ross, SIGMOD’04 
l Cache-conscious data placement 
l No Tuple Navigation 
“Block oriented processing of 
relational database operations in 
modern computer architectures” 
Padmanabhan, Malkemus, Agarwal, 
ICDE’01 
l Primitives are record-oblivious, only see arrays 
l Vectorization allows algorithmic optimization 
l Move activities out of the loop (“strength reduction”) 
l Compiler-friendly function bodies 
l Loop-pipelining, automatic SIMD 
VLDB 2009 Tutorial Column-Oriented Database Systems 134
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Balancing Vectorized Query Execution with 
Bandwidth Optimized Storage ” Zukowski, CWI 2008 
Vectorizing Relational Operators 
l Project 
l Select 
l Exploit selectivities, test buffer overflow 
l Aggregation 
l Ordered, Hashed 
l Sort 
l Radix-sort nicely vectorizes 
l Join 
l Merge-join + Hash-join 
VLDB 2009 Tutorial Column-Oriented Database Systems 135
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Column-Oriented 
Database Systems 
Efficient Column Store Compression 
VLDB 
2009 
Tutorial
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Key Ingredients 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
l Compress relations on a per-column basis 
l Columns compress well 
l Decompress small vectors of tuples from a column into 
the CPU cache 
l Minimize main-memory overhead 
l Use light-weight, CPU-efficient algorithms 
l Exploit processing power of modern CPUs 
VLDB 2009 Tutorial Column-Oriented Database Systems 137
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Key Ingredients 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
l Compress relations on a per-column basis 
l Columns compress well 
l Decompress small vectors of tuples from a column into 
the CPU cache 
l Minimize main-memory overhead 
VLDB 2009 Tutorial Column-Oriented Database Systems 138
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Vectorized Decompression 
X100 query engine 
ColumnBM 
(buffer manager) 
CPU 
cache 
(raid) 
Disk(s) 
RAM 
Idea: 
decompress a vector only 
compression: 
-between CPU and RAM 
-Instead of disk and RAM (classic) 
VLDB 2009 Tutorial Column-Oriented Database Systems 139
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
RAM-Cache Decompression 
l Decompress vectors 
on-demand into the 
cache 
l RAM-Cache boundary 
only crossed once 
l More (compressed) 
data cached in RAM 
l Less bandwidth use 
VLDB 2009 Tutorial Column-Oriented Database Systems 140
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Multi-Core Bandwidth & Compression 
Performance Degradation with Concurrent Queries 
VLDB 2009 Tutorial Column-Oriented Database Systems 141 
1.2 
1 
0.8 
0.6 
0.4 
0.2 
0 
1 2 3 4 5 6 7 8 
number of concurrent queries 
avg query speed (clock normalized) 
Intel® Xeon E5410 (Harpertown) - Q6b Normal 
Intel® Xeon E5410 (Harpertown) - Q6b Compressed 
Intel® Xeon X5560 (Nehalem) - Q6b Normal 
Intel® Xeon X5560 (Nehalem) - Q6b Compressed
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
CPU Efficient Decompression 
l Decoding loop over cache-resident vectors of code words 
l Avoid control dependencies within decoding loop 
l no if-then-else constructs in loop body 
l Avoid data dependencies between loop iterations 
VLDB 2009 Tutorial Column-Oriented Database Systems 142
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
Disk Block Layout 
l Forward growing 
section of arbitrary 
size code words 
(code word size fixed 
per block) 
VLDB 2009 Tutorial Column-Oriented Database Systems 143
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
Disk Block Layout 
l Forward growing 
section of arbitrary 
size code words 
(code word size 
fixed per block) 
l Backwards growing 
exception list 
VLDB 2009 Tutorial Column-Oriented Database Systems 144
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
Naïve Decompression Algorithm 
l Use reserved value from code word domain (MAXCODE) to mark 
exception positions 
VLDB 2009 Tutorial Column-Oriented Database Systems 145
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
Deterioriation With Exception% 
l 1.2GB/s deteriorates to 0.4GB/s 
VLDB 2009 Tutorial Column-Oriented Database Systems 146
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
Deterioriation With Exception% 
l Perf Counters: CPU mispredicts if-then-else 
VLDB 2009 Tutorial Column-Oriented Database Systems 147
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Patching 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
l Maintain a patch-list 
through code word 
section that links 
exception positions 
VLDB 2009 Tutorial Column-Oriented Database Systems 148
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Patching 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
l Maintain a patch-list 
through code word 
section that links 
exception positions 
l After decoding, patch up 
the exception positions 
with the correct values 
VLDB 2009 Tutorial Column-Oriented Database Systems 149
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
Patched Decompression 
VLDB 2009 Tutorial Column-Oriented Database Systems 150
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
Patched Decompression 
VLDB 2009 Tutorial Column-Oriented Database Systems 151
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
“Super-Scalar RAM-CPU Cache Compression” 
Zukowski, Heman, Nes, Boncz, ICDE’06 
Decompression Bandwidth 
Patching can be applied to: 
• Frame of Reference (PFOR) 
• Delta Compression (PFOR-DELTA) 
• Dictionary Compression (PDICT) 
Makes these methods much more 
applicable to noisy data 
Ł without additional cost 
l Patching makes two passes, but is faster! 
VLDB 2009 Tutorial Column-Oriented Database Systems 152
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Column-Oriented 
Database Systems 
Conclusion 
VLDB 
2009 
Tutorial
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Summary (1/2) 
l Columns and Row-Stores: different? 
l No fundamental differences 
l Can current row-stores simulate column-stores now? 
l not efficiently: row-stores need change 
l On disk layout vs execution layout 
l actually independent issues, on-the-fly conversion pays off 
l column favors sequential access, row random 
l Mixed Layout schemes 
l Fractured mirrors 
l PAX, Clotho 
l Data morphing 
VLDB 2009 Tutorial Column-Oriented Database Systems 154
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Summary (2/2) 
l Crucial Columnar Techniques 
l Storage 
l Lean headers, sparse indices, fast positional access 
l Compression 
l Operating on compressed data 
l Lightweight, vectorized decompression 
l Late vs Early materialization 
l Non-join: LM always wins 
l Naïve/Invisible/Jive/Flash/Radix Join (LM often wins) 
l Execution 
l Vectorized in-cache execution 
l Exploiting SIMD 
VLDB 2009 Tutorial Column-Oriented Database Systems 155
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Future Work 
l looking at write/load tradeoffs in column-stores 
l read-only vs batch loads vs trickle updates vs OLTP 
VLDB 2009 Tutorial Column-Oriented Database Systems 156
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Updates (1/3) 
l Column-stores are update-in-place averse 
l In-place: I/O for each column 
l + re-compression 
l + multiple sorted replicas 
l + sparse tree indices 
Update-in-place is infeasible! 
VLDB 2009 Tutorial Column-Oriented Database Systems 157
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Updates (2/3) 
l Column-stores use differential mechanisms instead 
l Differential lists/files or more advanced (e.g. PDTs) 
l Updates buffered in RAM, merged on each query 
l Checkpointing merges differences in bulk sequentially 
l I/O trends favor this anyway 
§ trade RAM for converting random into sequential I/O 
§ this trade is also needed in Flash (do not write randomly!) 
l How high loads can it sustain? 
§ Depends on available RAM for buffering (how long until full) 
§ Checkpoint must be done within that time 
§ The longer it can run, the less it molests queries 
§ Using Flash for buffering differences buys a lot of time 
§ Hundreds of GBs of differences per server 
VLDB 2009 Tutorial Column-Oriented Database Systems 158
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Updates (3/3) 
l Differential transactions favored by hardware trends 
l Snapshot semantics accepted by the user community 
l can always convert to serialized 
“Serializable Isolation For Snapshot Databases” Alomari, 
Cahill, Fekete, Roehm, SIGMOD’08 
Ł Row stores could also use differential transactions and be efficient! 
Ł Implies a departure from ARIES 
Ł Implies a full rewrite 
My conclusion: 
a system that combines row- and columns needs differentially 
implemented transactions. 
Starting from a pure column-store, this is a limited add-on. 
Starting from a pure row-store, this implies a full rewrite. 
VLDB 2009 Tutorial Column-Oriented Database Systems 159
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Future Work 
l looking at write/load tradeoffs in column-stores 
l read-only vs batch loads vs trickle updates vs OLTP 
l database design for column-stores 
l column-store specific optimizers 
l compression/materialization/join tricks Ł cost models? 
l hybrid column-row systems 
l can row-stores learn new column tricks? 
l Study of the minimal number changes one needs to make to a 
row store to get the majority of the benefits of a column-store 
l Alternative: add features to column-stores that make them more 
like row stores 
VLDB 2009 Tutorial Column-Oriented Database Systems 160
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 
Conclusion 
l Columnar techniques provide clear benefits for: 
l Data warehousing, BI 
l Information retrieval, graphs, e-science 
l A number of crucial techniques make them effective 
l Without these, existing row systems do not benefit 
l Row-Stores and column-stores could be combined 
l Row-stores may adopt some column-store techniques 
l Column-stores add row-store (or PAX) functionality 
l Many open issues to do research on! 
VLDB 2009 Tutorial Column-Oriented Database Systems 161

More Related Content

Viewers also liked

Programming methodology lecture11
Programming methodology lecture11Programming methodology lecture11
Programming methodology lecture11NYversity
 
Machine learning (8)
Machine learning (8)Machine learning (8)
Machine learning (8)NYversity
 
Programming methodology lecture03
Programming methodology lecture03Programming methodology lecture03
Programming methodology lecture03NYversity
 
Computer network (18)
Computer network (18)Computer network (18)
Computer network (18)NYversity
 
Data structures
Data structuresData structures
Data structuresNYversity
 
Programming methodology lecture14
Programming methodology lecture14Programming methodology lecture14
Programming methodology lecture14NYversity
 
Machine learning (3)
Machine learning (3)Machine learning (3)
Machine learning (3)NYversity
 
Design patterns
Design patternsDesign patterns
Design patternsNYversity
 
Programming methodology lecture26
Programming methodology lecture26Programming methodology lecture26
Programming methodology lecture26NYversity
 
Computer network (2)
Computer network (2)Computer network (2)
Computer network (2)NYversity
 
Computer network (3)
Computer network (3)Computer network (3)
Computer network (3)NYversity
 
Programming methodology lecture23
Programming methodology lecture23Programming methodology lecture23
Programming methodology lecture23NYversity
 
Programming methodology lecture27
Programming methodology lecture27Programming methodology lecture27
Programming methodology lecture27NYversity
 
Programming methodology lecture06
Programming methodology lecture06Programming methodology lecture06
Programming methodology lecture06NYversity
 
Programming methodology lecture17
Programming methodology lecture17Programming methodology lecture17
Programming methodology lecture17NYversity
 
Computer network (5)
Computer network (5)Computer network (5)
Computer network (5)NYversity
 
Computer network (15)
Computer network (15)Computer network (15)
Computer network (15)NYversity
 
Computer network (10)
Computer network (10)Computer network (10)
Computer network (10)NYversity
 
Computer network (14)
Computer network (14)Computer network (14)
Computer network (14)NYversity
 

Viewers also liked (19)

Programming methodology lecture11
Programming methodology lecture11Programming methodology lecture11
Programming methodology lecture11
 
Machine learning (8)
Machine learning (8)Machine learning (8)
Machine learning (8)
 
Programming methodology lecture03
Programming methodology lecture03Programming methodology lecture03
Programming methodology lecture03
 
Computer network (18)
Computer network (18)Computer network (18)
Computer network (18)
 
Data structures
Data structuresData structures
Data structures
 
Programming methodology lecture14
Programming methodology lecture14Programming methodology lecture14
Programming methodology lecture14
 
Machine learning (3)
Machine learning (3)Machine learning (3)
Machine learning (3)
 
Design patterns
Design patternsDesign patterns
Design patterns
 
Programming methodology lecture26
Programming methodology lecture26Programming methodology lecture26
Programming methodology lecture26
 
Computer network (2)
Computer network (2)Computer network (2)
Computer network (2)
 
Computer network (3)
Computer network (3)Computer network (3)
Computer network (3)
 
Programming methodology lecture23
Programming methodology lecture23Programming methodology lecture23
Programming methodology lecture23
 
Programming methodology lecture27
Programming methodology lecture27Programming methodology lecture27
Programming methodology lecture27
 
Programming methodology lecture06
Programming methodology lecture06Programming methodology lecture06
Programming methodology lecture06
 
Programming methodology lecture17
Programming methodology lecture17Programming methodology lecture17
Programming methodology lecture17
 
Computer network (5)
Computer network (5)Computer network (5)
Computer network (5)
 
Computer network (15)
Computer network (15)Computer network (15)
Computer network (15)
 
Computer network (10)
Computer network (10)Computer network (10)
Computer network (10)
 
Computer network (14)
Computer network (14)Computer network (14)
Computer network (14)
 

Similar to Database system

Best Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar DatabasesBest Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar DatabasesDATAVERSITY
 
Data Bases - Introduction to data science
Data Bases - Introduction to data scienceData Bases - Introduction to data science
Data Bases - Introduction to data scienceFrank Kienle
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBjhugg
 
Improving performance of decision support queries in columnar cloud database ...
Improving performance of decision support queries in columnar cloud database ...Improving performance of decision support queries in columnar cloud database ...
Improving performance of decision support queries in columnar cloud database ...Serkan Özal
 
C-Store-s553-stonebraker.ppt
C-Store-s553-stonebraker.pptC-Store-s553-stonebraker.ppt
C-Store-s553-stonebraker.pptJinwenZhong1
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
SQL, Oracle, Joins
SQL, Oracle, JoinsSQL, Oracle, Joins
SQL, Oracle, JoinsGaurish Goel
 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSabdurrobsoyon
 
CodeFest 2014. Осипов К. — NoSQL: вангуем вместе
CodeFest 2014. Осипов К. — NoSQL: вангуем вместеCodeFest 2014. Осипов К. — NoSQL: вангуем вместе
CodeFest 2014. Осипов К. — NoSQL: вангуем вместеCodeFest
 
Sql a practical introduction
Sql   a practical introductionSql   a practical introduction
Sql a practical introductionHasan Kata
 
Sql a practical introduction
Sql   a practical introductionSql   a practical introduction
Sql a practical introductionsanjaychauhan689
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes Minio
 
Summary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in TokyoSummary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in TokyoCLOUDIAN KK
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandraPL dream
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesUlf Wendel
 

Similar to Database system (20)

Database system
Database systemDatabase system
Database system
 
Best Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar DatabasesBest Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar Databases
 
Data Bases - Introduction to data science
Data Bases - Introduction to data scienceData Bases - Introduction to data science
Data Bases - Introduction to data science
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
Improving performance of decision support queries in columnar cloud database ...
Improving performance of decision support queries in columnar cloud database ...Improving performance of decision support queries in columnar cloud database ...
Improving performance of decision support queries in columnar cloud database ...
 
C-Store-s553-stonebraker.ppt
C-Store-s553-stonebraker.pptC-Store-s553-stonebraker.ppt
C-Store-s553-stonebraker.ppt
 
Lecture3.ppt
Lecture3.pptLecture3.ppt
Lecture3.ppt
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
SQL, Oracle, Joins
SQL, Oracle, JoinsSQL, Oracle, Joins
SQL, Oracle, Joins
 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMS
 
NoSql databases
NoSql databasesNoSql databases
NoSql databases
 
CodeFest 2014. Осипов К. — NoSQL: вангуем вместе
CodeFest 2014. Осипов К. — NoSQL: вангуем вместеCodeFest 2014. Осипов К. — NoSQL: вангуем вместе
CodeFest 2014. Осипов К. — NoSQL: вангуем вместе
 
No SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupNo SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability Meetup
 
Sql a practical introduction
Sql   a practical introductionSql   a practical introduction
Sql a practical introduction
 
Sql a practical introduction
Sql   a practical introductionSql   a practical introduction
Sql a practical introduction
 
Database Management & Models
Database Management & ModelsDatabase Management & Models
Database Management & Models
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
Summary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in TokyoSummary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in Tokyo
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodes
 

More from NYversity

Programming methodology-1.1
Programming methodology-1.1Programming methodology-1.1
Programming methodology-1.1NYversity
 
3016 all-2007-dist
3016 all-2007-dist3016 all-2007-dist
3016 all-2007-distNYversity
 
Programming methodology lecture28
Programming methodology lecture28Programming methodology lecture28
Programming methodology lecture28NYversity
 
Programming methodology lecture25
Programming methodology lecture25Programming methodology lecture25
Programming methodology lecture25NYversity
 
Programming methodology lecture24
Programming methodology lecture24Programming methodology lecture24
Programming methodology lecture24NYversity
 
Programming methodology lecture22
Programming methodology lecture22Programming methodology lecture22
Programming methodology lecture22NYversity
 
Programming methodology lecture20
Programming methodology lecture20Programming methodology lecture20
Programming methodology lecture20NYversity
 
Programming methodology lecture19
Programming methodology lecture19Programming methodology lecture19
Programming methodology lecture19NYversity
 
Programming methodology lecture18
Programming methodology lecture18Programming methodology lecture18
Programming methodology lecture18NYversity
 
Programming methodology lecture16
Programming methodology lecture16Programming methodology lecture16
Programming methodology lecture16NYversity
 
Programming methodology lecture15
Programming methodology lecture15Programming methodology lecture15
Programming methodology lecture15NYversity
 
Programming methodology lecture13
Programming methodology lecture13Programming methodology lecture13
Programming methodology lecture13NYversity
 
Programming methodology lecture12
Programming methodology lecture12Programming methodology lecture12
Programming methodology lecture12NYversity
 
Programming methodology lecture10
Programming methodology lecture10Programming methodology lecture10
Programming methodology lecture10NYversity
 
Programming methodology lecture09
Programming methodology lecture09Programming methodology lecture09
Programming methodology lecture09NYversity
 
Programming methodology lecture08
Programming methodology lecture08Programming methodology lecture08
Programming methodology lecture08NYversity
 
Programming methodology lecture07
Programming methodology lecture07Programming methodology lecture07
Programming methodology lecture07NYversity
 
Programming methodology lecture05
Programming methodology lecture05Programming methodology lecture05
Programming methodology lecture05NYversity
 
Programming methodology lecture04
Programming methodology lecture04Programming methodology lecture04
Programming methodology lecture04NYversity
 
Programming methodology lecture02
Programming methodology lecture02Programming methodology lecture02
Programming methodology lecture02NYversity
 

More from NYversity (20)

Programming methodology-1.1
Programming methodology-1.1Programming methodology-1.1
Programming methodology-1.1
 
3016 all-2007-dist
3016 all-2007-dist3016 all-2007-dist
3016 all-2007-dist
 
Programming methodology lecture28
Programming methodology lecture28Programming methodology lecture28
Programming methodology lecture28
 
Programming methodology lecture25
Programming methodology lecture25Programming methodology lecture25
Programming methodology lecture25
 
Programming methodology lecture24
Programming methodology lecture24Programming methodology lecture24
Programming methodology lecture24
 
Programming methodology lecture22
Programming methodology lecture22Programming methodology lecture22
Programming methodology lecture22
 
Programming methodology lecture20
Programming methodology lecture20Programming methodology lecture20
Programming methodology lecture20
 
Programming methodology lecture19
Programming methodology lecture19Programming methodology lecture19
Programming methodology lecture19
 
Programming methodology lecture18
Programming methodology lecture18Programming methodology lecture18
Programming methodology lecture18
 
Programming methodology lecture16
Programming methodology lecture16Programming methodology lecture16
Programming methodology lecture16
 
Programming methodology lecture15
Programming methodology lecture15Programming methodology lecture15
Programming methodology lecture15
 
Programming methodology lecture13
Programming methodology lecture13Programming methodology lecture13
Programming methodology lecture13
 
Programming methodology lecture12
Programming methodology lecture12Programming methodology lecture12
Programming methodology lecture12
 
Programming methodology lecture10
Programming methodology lecture10Programming methodology lecture10
Programming methodology lecture10
 
Programming methodology lecture09
Programming methodology lecture09Programming methodology lecture09
Programming methodology lecture09
 
Programming methodology lecture08
Programming methodology lecture08Programming methodology lecture08
Programming methodology lecture08
 
Programming methodology lecture07
Programming methodology lecture07Programming methodology lecture07
Programming methodology lecture07
 
Programming methodology lecture05
Programming methodology lecture05Programming methodology lecture05
Programming methodology lecture05
 
Programming methodology lecture04
Programming methodology lecture04Programming methodology lecture04
Programming methodology lecture04
 
Programming methodology lecture02
Programming methodology lecture02Programming methodology lecture02
Programming methodology lecture02
 

Recently uploaded

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 

Recently uploaded (20)

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 

Database system

  • 1. VLDB 2009 Tutorial Column-Oriented Database Systems 1 Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Column-Oriented Database Systems Part 1: Stavros Harizopoulos (HP Labs) Part 2: Daniel Abadi (Yale) Part 3: Peter Boncz (CWI) VLDB 2009 Tutorial
  • 2. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) What is a column-store? row-store column-store Date Store Product Customer Date Store Product Customer Price Price + easy to add/modify a record - might read in unnecessary data + only need to read in relevant data - tuple writes require multiple accesses => suitable for read-mostly, read-intensive, large data repositories VLDB 2009 Tutorial Column-Oriented Database Systems 2
  • 3. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Are these two fundamentally different? l The only fundamental difference is the storage layout l However: we need to look at the big picture different storage layouts proposed row-stores row-stores++ row-stores++ ‘70s ‘80s ‘90s ‘00s today new applications new bottleneck in hardware column-stores converge? l How did we get here, and where we are heading l What are the column-specific optimizations? Part 2 Part 1 l How do we improve CPU efficiency when operating on Cs Part 3 VLDB 2009 Tutorial Column-Oriented Database Systems 3
  • 4. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Outline l Part 1: Basic concepts — Stavros l Introduction to key features l From DSM to column-stores and performance tradeoffs l Column-store architecture overview l Will rows and columns ever converge? l Part 2: Column-oriented execution — Daniel l Part 3: MonetDB/X100 and CPU efficiency — Peter VLDB 2009 Tutorial Column-Oriented Database Systems 4
  • 5. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Telco Data Warehousing example l Typical DW installation l Real-world example fact table usage source dimension tables account or RAM toll star schema “One Size Fits All? - Part 2: Benchmarking Results” Stonebraker et al. CIDR 2007 QUERY 2 SELECT account.account_number, sum (usage.toll_airtime), sum (usage.toll_price) FROM usage, toll, source, account WHERE usage.toll_id = toll.toll_id AND usage.source_id = source.source_id AND usage.account_id = account.account_id AND toll.type_ind in (‘AE’. ‘AA’) AND usage.toll_price > 0 AND source.type != ‘CIBER’ AND toll.rating_method = ‘IS’ AND usage.invoice_date = 20051013 GROUP BY account.account_number Column-store Row-store Query 1 2.06 300 Query 2 2.20 300 Query 3 0.09 300 Query 4 5.24 300 Query 5 2.88 300 Why? Three main factors (next slides) VLDB 2009 Tutorial Column-Oriented Database Systems 5
  • 6. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Telco example explained (1/3): read efficiency row store column store read pages containing entire rows one row = 212 columns! is this typical? (it depends) read only columns needed in this example: 7 columns caveats: • “select * ” not any faster • clever disk prefetching • clever tuple reconstruction What about vertical partitioning? (it does not work with ad-hoc queries) VLDB 2009 Tutorial Column-Oriented Database Systems 6
  • 7. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Telco example explained (2/3): compression efficiency l Columns compress better than rows l Typical row-store compression ratio 1 : 3 l Column-store 1 : 10 l Why? l Rows contain values from different domains => more entropy, difficult to dense-pack l Columns exhibit significantly less entropy l Examples: Male, Female, Female, Female, Male 1998, 1998, 1999, 1999, 1999, 2000 l Caveat: CPU cost (use lightweight compression) VLDB 2009 Tutorial Column-Oriented Database Systems 7
  • 8. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Telco example explained (3/3): sorting & indexing efficiency l Compression and dense-packing free up space l Use multiple overlapping column collections l Sorted columns compress better l Range queries are faster l Use sparse clustered indexes What about heavily-indexed row-stores? (works well for single column access, cross-column joins become increasingly expensive) VLDB 2009 Tutorial Column-Oriented Database Systems 8
  • 9. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Additional opportunities for column-stores l Block-tuple / vectorized processing l Easier to build block-tuple operators l Amortizes function-call cost, improves CPU cache performance l Easier to apply vectorized primitives l Software-based: bitwise operations l Hardware-based: SIMD Part 3 l Opportunities with compressed columns l Avoid decompression: operate directly on compressed l Delay decompression (and tuple reconstruction) l Also known as: late materialization more in Part 2 l Exploit columnar storage in other DBMS components l Physical design (both static and dynamic) See: Database Cracking, from CWI VLDB 2009 Tutorial Column-Oriented Database Systems 9
  • 10. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Effect on C-Store performance “Column-Stores vs Row-Stores: How Different are They Really?” Abadi, Hachem, and Madden. SIGMOD 2008. Time (sec) Average for SSBM queries on C-store original C-store enable late enable materialization compression & operate on compressed column-oriented join algorithm VLDB 2009 Tutorial Column-Oriented Database Systems 10
  • 11. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Summary of column-store key features l Storage layout l Execution engine l Design tools, optimizer columnar storage header/ID elimination compression multiple sort orders Part 1 Part 2 column operators avoid decompression Part 1 Part 3 Part 2 Part 2 late materialization vectorized operations Part 3 VLDB 2009 Tutorial Column-Oriented Database Systems 11
  • 12. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Outline l Part 1: Basic concepts — Stavros l Introduction to key features l From DSM to column-stores and performance tradeoffs l Column-store architecture overview l Will rows and columns ever converge? l Part 2: Column-oriented execution — Daniel l Part 3: MonetDB/X100 and CPU efficiency — Peter VLDB 2009 Tutorial Column-Oriented Database Systems 12
  • 13. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) From DSM to Column-stores 70s -1985: TOD: Time Oriented Database – Wiederhold et al. "A Modular, Self-Describing Clinical Databank System," Computers and Biomedical Research, 1975 More 1970s: Transposed files, Lorie, Batory, Svensson. “An overview of cantor: a new system for data analysis” Karasalo, Svensson, SSDBM 1983 1985: DSM paper “A decomposition storage model” Copeland and Khoshafian. SIGMOD 1985. 1990s: Commercialization through SybaseIQ Late 90s – 2000s: Focus on main-memory performance l DSM “on steroids” [1997 – now] l Hybrid DSM/NSM [2001 – 2004] CWI: MonetDB Wisconsin: PAX, Fractured Mirrors Michigan: Data Morphing CMU: Clotho 2005 – : Re-birth of read-optimized DSM as “column-store” MIT: C-Store CWI: MonetDB/X100 10+ startups VLDB 2009 Tutorial Column-Oriented Database Systems 13
  • 14. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) The original DSM paper “A decomposition storage model” Copeland and Khoshafian. SIGMOD 1985. l Proposed as an alternative to NSM l 2 indexes: clustered on ID, non-clustered on value l Speeds up queries projecting few columns l Requires more storage 1 2 3 4 .. ID value 0100 0962 1000 .. VLDB 2009 Tutorial Column-Oriented Database Systems 14
  • 15. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Memory wall and PAX l 90s: Cache-conscious research “Cache Conscious Algorithms for Relational Query Processing.” Shatdal, Kant, Naughton. VLDB 1994. from: “Database Architecture Optimized for the New Bottleneck: Memory Access.” Boncz, Manegold, Kersten. VLDB 1999. l PAX: Partition Attributes Across l Retains NSM I/O pattern “DBMSs on a modern processor: Where does time go?” Ailamaki, DeWitt, Hill, Wood. VLDB 1999. l Optimizes cache-to-RAM communication “Weaving Relations for Cache Performance.” Ailamaki, DeWitt, Hill, Skounakis, VLDB 2001. VLDB 2009 Tutorial Column-Oriented Database Systems 15 to: and:
  • 16. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) More hybrid NSM/DSM schemes l Dynamic PAX: Data Morphing “Data morphing: an adaptive, cache-conscious storage technique.” Hankins, Patel, VLDB 2003. l Clotho: custom layout using scatter-gather I/O “Clotho: Decoupling Memory Page Layout from Storage Organization.” Shao, Schindler, Schlosser, Ailamaki, and Ganger. VLDB 2004. l Fractured mirrors l Smart mirroring with both NSM/DSM copies “A Case For Fractured Mirrors.” Ramamurthy, DeWitt, Su, VLDB 2002. VLDB 2009 Tutorial Column-Oriented Database Systems 16
  • 17. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory l Improve computational efficiency by avoiding expression interpreter l DSM with virtual IDs natural choice l Developed new query execution algebra l Initial contributions: l Pointed out memory-wall in DBMSs l Cache-conscious projections and joins l … VLDB 2009 Tutorial Column-Oriented Database Systems 17
  • 18. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 2005: the (re)birth of column-stores l New hardware and application realities l Faster CPUs, larger memories, disk bandwidth limit l Multi-terabyte Data Warehouses l New approach: combine several techniques l Read-optimized, fast multi-column access, disk/CPU efficiency, light-weight compression l C-store paper: l First comprehensive design description of a column-store l MonetDB/X100 l “proper” disk-based column store l Explosion of new products VLDB 2009 Tutorial Column-Oriented Database Systems 18
  • 19. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Performance tradeoffs: columns vs. rows DSM traditionally was not favored by technology trends How has this changed? l Optimized DSM in “Fractured Mirrors,” 2002 l “Apples-to-apples” comparison l Follow-up study “Read-Optimized Databases, In- Depth” Holloway, DeWitt, VLDB’08 l Main-memory DSM vs. NSM “Performance Tradeoffs in Read- Optimized Databases” Harizopoulos, Liang, Abadi, Madden, VLDB’06 “DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing” Boncz, Zukowski, Nes, DaMoN’08 l Flash-disks: a come-back for PAX? “Query Processing Techniques for Solid State Drives” Tsirogiannis, Harizopoulos, Shah, Wiener, Graefe, SIGMOD’09 “Fast Scans and Joins Using Flash Drives” Shah, Harizopoulos, Wiener, Graefe. DaMoN’08 VLDB 2009 Tutorial Column-Oriented Database Systems 19
  • 20. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Fractured mirrors: a closer look l Store DSM relations inside a B-tree l Leaf nodes contain values “A Case For Fractured Mirrors” Ramamurthy, DeWitt, Su, VLDB 2002. l Eliminate IDs, amortize header overhead l Custom implementation on Shore sparse B-tree on ID 3 1 a1 2 a2 3 a3 4 a4 5 a5 1 a1 a2 a3 4 a4 a5 “Efficient columnar storage in B-trees” Graefe. Sigmod Record 03/2007. Similar: storage density comparable to column stores TTIIDD 1 2 3 Column Data Column Data a1 a2 a3 Tuple Header Tuple Header 4 a4 5 a5 VLDB 2009 Tutorial Column-Oriented Database Systems 20
  • 21. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Fractured mirrors: performance From PAX paper: regular DSM l Chunk-based tuple merging l Read in segments of M pages l Merge segments in memory time l Becomes CPU-bound after 5 pages row column? column? columns projected: 1 2 3 4 5 optimized DSM VLDB 2009 Tutorial Column-Oriented Database Systems 21
  • 22. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Column-scanner implementation “Performance Tradeoffs in Read- Optimized Databases” Harizopoulos, Liang, Abadi, Madden, VLDB’06 row scanner column scanner Joe 45 … … apply predicate(s) 1 Joe 45 2 Sue 37 … … … SELECT name, age WHERE age > 40 Joe Sue Joe 45 … … predicate #1 45 37 … … S Direct I/O prefetch ~100ms worth of data #POS 45 #POS … S S apply VLDB 2009 Tutorial Column-Oriented Database Systems 22
  • 23. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Scan performance l Large prefetch hides disk seeks in columns l Column-CPU efficiency with lower selectivity l Row-CPU suffers from memory stalls l Memory stalls disappear in narrow tuples l Compression: similar to narrow not shown, details in the paper VLDB 2009 Tutorial Column-Oriented Database Systems 23
  • 24. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Even more results “Read-Optimized Databases, In- Depth” Holloway, DeWitt, VLDB’08 35 narrow & compressed tuple: 30 25 20 Time (s) 15 10 5 0 CPU-bound! C-25% C-10% R-50% 1 2 3 4 5 6 7 8 9 10 Columns Returned • Same engine as before • Additional findings wide attributes: same as before Non-selective queries, narrow tuples, favor well-compressed rows Materialized views are a win Column-joins are Scan times determine early materialized joins covered in part 2! VLDB 2009 Tutorial Column-Oriented Database Systems 24
  • 25. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Speedup of columns over rows “Performance Tradeoffs in Read- Optimized Databases” Harizopoulos, Liang, Abadi, Madden, VLDB’06 +++ _= + ++ 8 12 16 20 24 28 32 36 tuple width cycles per disk byte (cpdb) 144 72 36 18 9 l Rows favored by narrow tuples and low cpdb l Disk-bound workloads have higher cpdb VLDB 2009 Tutorial Column-Oriented Database Systems 25
  • 26. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Varying prefetch size 40 30 20 10 0 4 8 12 16 20 24 28 32 Column 2 Column 8 Column 16 Column 48 (x 128KB) l No prefetching hurts columns in single scans VLDB 2009 Tutorial Column-Oriented Database Systems 26 time (sec) selected bytes per tuple Row (any prefetch size) no competing disk traffic
  • 27. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Varying prefetch size with competing disk traffic 40 30 20 10 0 Column, 48 Row, 48 4 12 20 28 40 30 20 10 0 Column, 8 Row, 8 4 12 20 28 selected bytes per tuple time (sec) l No prefetching hurts columns in single scans l Under competing traffic, columns outperform rows for any prefetch size VLDB 2009 Tutorial Column-Oriented Database Systems 27
  • 28. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) CPU Performance “DSM vs. NSM: CPU performance trade offs in block-oriented query processing” Boncz, Zukowski, Nes, DaMoN’08 l Benefit in on-the-fly conversion between NSM and DSM l DSM: sequential access (block fits in L2), random in L1 l NSM: random access, SIMD for grouped Aggregation VLDB 2009 Tutorial Column-Oriented Database Systems 28
  • 29. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) New storage technology: Flash SSDs l Performance characteristics l very fast random reads, slow random writes l fast sequential reads and writes l Price per bit (capacity follows) l cheaper than RAM, order of magnitude more expensive than Disk l Flash Translation Layer introduces unpredictability l avoid random writes! l Form factors not ideal yet l SSD (Ł small reads still suffer from SATA overhead/OS limitations) l PCI card (Ł high price, limited expandability) l Boost Sequential I/O in a simple package l Flash RAID: very tight bandwidth/cm3 packing (4GB/sec inside the box) l Column Store Updates l useful for delta structures and logs l Random I/O on flash fixes unclustered index access l still suboptimal if I/O block size > record size l therefore column stores profit mush less than horizontal stores l Random I/O useful to exploit secondary, tertiary table orderings l the larger the data, the deeper clustering one can exploit VLDB 2009 Tutorial Column-Oriented Database Systems 29
  • 30. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Even faster column scans on flash SSDs l New-generation SSDs 30K Read IOps, 3K Write Iops 250MB/s Read BW, 200MB/s Write l Very fast random reads, slower random writes l Fast sequential RW, comparable to HDD arrays l No expensive seeks across columns l FlashScan and Flashjoin: PAX on SSDs, inside Postgres “Query Processing Techniques for Solid State Drives” Tsirogiannis, Harizopoulos, Shah, Wiener, Graefe, SIGMOD’09 mini-pages with no qualified attributes are not accessed VLDB 2009 Tutorial Column-Oriented Database Systems 30
  • 31. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Column-scan performance over time from 7x slower ..to 1.2x slower ..to same column-store (2006) ..to 2x slower and 3x faster! regular DSM (2001) optimized DSM (2002) SSD Postgres/PAX (2009) VLDB 2009 Tutorial Column-Oriented Database Systems 31
  • 32. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Outline l Part 1: Basic concepts — Stavros l Introduction to key features l From DSM to column-stores and performance tradeoffs l Column-store architecture overview l Will rows and columns ever converge? l Part 2: Column-oriented execution — Daniel l Part 3: MonetDB/X100 and CPU efficiency — Peter VLDB 2009 Tutorial Column-Oriented Database Systems 32
  • 33. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Architecture of a column-store storage layout l read-optimized: dense-packed, compressed l organize in extends, batch updates l multiple sort orders l sparse indexes engine l block-tuple operators l new access methods l optimized relational system-level operators l system-wide column support l loading / updates l scaling through multiple nodes l transactions / redundancy VLDB 2009 Tutorial Column-Oriented Database Systems 33
  • 34. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) C-Store l Compress columns l No alignment l Big disk blocks “C-Store: A Column-Oriented DBMS.” Stonebraker et al. VLDB 2005. l Only materialized views (perhaps many) l Focus on Sorting not indexing l Data ordered on anything, not just time l Automatic physical DBMS design l Optimize for grid computing l Innovative redundancy l Xacts – but no need for Mohan l Column optimizer and executor VLDB 2009 Tutorial Column-Oriented Database Systems 34
  • 35. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) C-Store: only materialized views (MVs) l Projection (MV) is some number of columns from a fact table l Plus columns in a dimension table – with a 1-n join between Fact and Dimension table l Stored in order of a storage key(s) l Several may be stored! l With a permutation, if necessary, to map between them l Table (as the user specified it and sees it) is not stored! l No secondary indexes (they are a one column sorted MV plus a permutation, if you really want one) User view: EMP (name, age, salary, dept) Dept (dname, floor) Possible set of MVs: MV-1 (name, dept, floor) in floor order MV-2 (salary, age) in age order MV-3 (dname, salary, name) in salary order VLDB 2009 Tutorial Column-Oriented Database Systems 35
  • 36. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Continuous Load and Query (Vertica) Hybrid Storage Architecture TUPLE MOVER Asynchronous Data Transfer > Read Optimized Store (ROS) • On disk • Sorted / Compressed • Segmented • Large data loaded direct A B C (A B C | A) Trickle Load > Write Optimized Store (WOS) A B C §Memory based §Unsorted / Uncompressed §Segmented §Low latency / Small quick inserts VLDB 2009 Tutorial Column-Oriented Database Systems 36
  • 37. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Loading Data (Vertica) Write-Optimized Store (WOS) In-memory Automatic Tuple Mover Read-Optimized Store (ROS) On-disk > INSERT, UPDATE, DELETE > Bulk and Trickle Loads §COPY §COPY DIRECT > User loads data into logical Tables > Vertica loads atomically into storage VLDB 2009 Tutorial Column-Oriented Database Systems 37
  • 38. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Applications for column-stores l Data Warehousing l High end (clustering) l Mid end/Mass Market l Personal Analytics l Data Mining l E.g. Proximity l Google BigTable l RDF l Semantic web data management l Information retrieval l Terabyte TREC l Scientific datasets l SciDB initiative l SLOAN Digital Sky Survey on MonetDB VLDB 2009 Tutorial Column-Oriented Database Systems 38
  • 39. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) List of column-store systems l Cantor (history) l Sybase IQ l SenSage (former Addamark Technologies) l Kdb l 1010data l MonetDB l C-Store/Vertica l X100/VectorWise l KickFire l SAP Business Accelerator l Infobright l ParAccel l Exasol VLDB 2009 Tutorial Column-Oriented Database Systems 39
  • 40. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Outline l Part 1: Basic concepts — Stavros l Introduction to key features l From DSM to column-stores and performance tradeoffs l Column-store architecture overview l Will rows and columns ever converge? l Part 2: Column-oriented execution — Daniel l Part 3: MonetDB/X100 and CPU efficiency — Peter VLDB 2009 Tutorial Column-Oriented Database Systems 40
  • 41. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Simulate a Column-Store inside a Row-Store Date Store Product Customer Price Date 01/01 01/01 01/01 Option B: Index Every Column Date Index Store Index Store Product Customer Price VLDB 2009 Tutorial Column-Oriented Database Systems 41 1 2 3 1 2 3 1 2 3 Option A: Vertical Partitioning … 1 2 3 1 2 3 01/01 01/01 01/01 BOS NYC BOS Table Chair Bed Mesa Lutz Mudd $20 $13 $79 BOS NYC BOS Table Chair Bed Mesa Lutz Mudd $20 $13 $79 TID Value TID Value TID Value TID Value TID Value
  • 42. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Simulate a Column-Store inside a Row-Store Date Store Product Customer Price Vertical Partitioning Store TID Value 1 2 3 BOS NYC BOS Product TID Value 1 2 3 Option A: Option B: Index Every Column Date Index Store Index … Table Chair Customer TID Value 1 2 3 Price TID Value 1 2 3 01/01 01/01 01/01 Bed Mesa Lutz Mudd $20 $13 $79 BOS NYC BOS Table Chair Bed Mesa Lutz Mudd $20 $13 $79 Date Value StartPos Length 01/01 1 3 Can explicitly run-length encode date “Teaching an Old Elephant New Tricks.” Bruno, CIDR 2009. VLDB 2009 Tutorial Column-Oriented Database Systems 42
  • 43. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Experiments 250.0 200.0 150.0 100.0 50.0 0.0 Normal Row-Store Vertically Partitioned Row-Store Adjoined Dimension Column Index (ADC Index) to Improve Star Schema Query Performance”. O’Neil et. al. ICDE 2008. Row-Store With All Indexes VLDB 2009 Tutorial Column-Oriented Database Systems 43 Time (seconds) Average 25.7 79.9 221.2 “Column-Stores vs Row-Stores: How Different are They Really?” Abadi, Hachem, and Madden. SIGMOD 2008. l Star Schema Benchmark (SSBM) l Implemented by professional DBA l Original row-store plus 2 column-store simulations on same row-store product
  • 44. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) What’s Going On? Vertical Partitions l Vertical partitions in row-stores: l Work well when workload is known l ..and queries access disjoint sets of columns l See automated physical design l Do not work well as full-columns l TupleID overhead significant l Excessive joins “Column-Stores vs. Row-Stores: How Different Are They Really?” Abadi, Madden, and Hachem. SIGMOD 2008. TID Column 1 2 3 Data Tuple Header Queries touch 3-4 foreign keys in fact table, 1-2 numeric columns Complete fact table takes up ~4 GB (compressed) Vertically partitioned tables take up 0.7-1.1 GB (compressed) VLDB 2009 Tutorial Column-Oriented Database Systems 44
  • 45. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) What’s Going On? All Indexes Case l Tuple construction l Common type of query: SELECT store_name, SUM(revenue) FROM Facts, Stores WHERE fact.store_id = stores.store_id AND stores.country = “Canada” GROUP BY store_name l Result of lower part of query plan is a set of TIDs that passed all predicates l Need to extract SELECT attributes at these TIDs l BUT: index maps value to TID l You really want to map TID to value (i.e., a vertical partition) à Tuple construction is SLOW VLDB 2009 Tutorial Column-Oriented Database Systems 45
  • 46. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) So…. l All indexes approach is a poor way to simulate a column-store l Problems with vertical partitioning are NOT fundamental l Store tuple header in a separate partition l Allow virtual TIDs l Combine clustered indexes, vertical partitioning l So can row-stores simulate column-stores? l Might be possible, BUT: l Need better support for vertical partitioning at the storage layer l Need support for column-specific optimizations at the executer level l Full integration: buffer pool, transaction manager, .. l When will this happen? l Most promising features = soon See Part 2, Part 3 for most promising features l ..unless new technology / new objectives change the game (SSDs, Massively Parallel Platforms, Energy-efficiency) VLDB 2009 Tutorial Column-Oriented Database Systems 46
  • 47. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) End of Part 1 l Basic concepts — Stavros l Introduction to key features l From DSM to column-stores and performance tradeoffs l Column-store architecture overview l Will rows and columns ever converge? l Part 2: Column-oriented execution — Daniel l Part 3: MonetDB/X100 and CPU efficiency — Peter VLDB 2009 Tutorial Column-Oriented Database Systems 47
  • 48. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Part 2 Outline l Compression l Tuple Materialization l Joins VLDB 2009 Tutorial Column-Oriented Database Systems 48
  • 49. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Column-Oriented Database Systems Compression VLDB 2009 Tutorial “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 “Integrating Compression and Execution in Column- Oriented Database Systems” Abadi, Madden, and Ferreira, SIGMOD ’06 •Query optimization in compressed database systems” Chen, Gehrke, Korn, SIGMOD’01
  • 50. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Compression l Trades I/O for CPU l Increased column-store opportunities: l Higher data value locality in column stores l Techniques such as run length encoding far more useful l Can use extra space to store multiple copies of data in different sort orders VLDB 2009 Tutorial Column-Oriented Database Systems 50
  • 51. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Run-length Encoding Quarter Product ID Q1 Q1 Q1 Q1 Q1 Q1 Q1 … Q2 Q2 Q2 Q2 … 1 1 1 1 1 2 2 … 1 1 1 2 … Quarter Product ID (value, start_pos, run_length) (value, start_pos, run_length) (1, 1, 5) (2, 6, 2) … (1, 301, 3) (2, 304, 1) … (Q1, 1, 300) (Q2, 301, 350) (Q3, 651, 500) (Q4, 1151, 600) Price 5 7 2 9 6 8 5 … 3 8 1 4 … Price 5 7 2 9 6 8 5 … 3 8 1 4 … VLDB 2009 Tutorial Column-Oriented Database Systems 51
  • 52. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Bit-vector Encoding “Integrating Compression and Execution in Column- Oriented Database Systems” Abadi et. al, SIGMOD ’06 Product ID 1 1 1 1 1 2 2 … 1 1 2 3 … ID: 1 ID: 2 ID: 3 1 1 1 1 1 0 0 … 1 1 0 0 … … 0 0 0 0 0 0 0 … 0 0 0 0 … 0 0 0 0 0 0 0 … 0 0 0 1 … 0 0 0 0 0 1 1 … 0 0 1 0 … l For each unique value, v, in column c, create bit-vector b l b[i] = 1 if c[i] = v l Good for columns with few unique values l Each bit-vector can be further compressed if sparse VLDB 2009 Tutorial Column-Oriented Database Systems 52
  • 53. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Integrating Compression and Execution in Column- Oriented Database Systems” Abadi et. al, SIGMOD ’06 Dictionary Encoding Quarter Q1 Q2 Q4 Q1 Q3 Q1 Q1 Q1 Q2 Q4 Q3 Q3 … Quarter 0130200 1322 Quarter 24 128 122 + Dictionary 24: Q1, Q2, Q4, Q1 … 0 + Dictionary 0: Q1 1: Q2 2: Q3 3: Q4 OR 122: Q2, Q4, Q3, Q3 … 128: Q3, Q1, Q1, Q1 l For each unique value create dictionary entry l Dictionary can be per-block or per-column l Column-stores have the advantage that dictionary entries may encode multiple values at once VLDB 2009 Tutorial Column-Oriented Database Systems 53
  • 54. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Frame Of Reference Encoding Price 45 54 48 55 51 53 40 50 49 62 52 50 … Price Frame: 50 -5 4 -2 5 1 3 ∞ 40 0 -1 ∞ 62 2 0 … 4 bits per value Exceptions (see part 3 for a better way to deal with exceptions) l Encodes values as b bit offset from chosen frame of reference l Special escape code (e.g. all bits set to 1) indicates a difference larger than can be stored in b bits l After escape code, original (uncompressed) value is written “Compressing Relations and Indexes ” Goldstein, Ramakrishnan, Shaft, ICDE’98 VLDB 2009 Tutorial Column-Oriented Database Systems 54
  • 55. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Differential Encoding Time 5:00 5:02 5:03 5:03 5:04 5:06 5:07 5:08 5:10 5:15 5:16 5:16 … Time 5:00 2 1 0 1 2 1 1 2 ∞ 5:15 1 0 2 bits per value Exception (see part 3 for a better way to deal with exceptions) l Encodes values as b bit offset from previous value l Special escape code (just like frame of reference encoding) indicates a difference larger than can be stored in b bits l After escape code, original (uncompressed) value is written l Performs well on columns containing increasing/decreasing sequences l inverted lists l timestamps l object IDs l sorted / clustered columns “Improved Word-Aligned Binary Compression for Text Indexing” Ahn, Moffat, TKDE’06 VLDB 2009 Tutorial Column-Oriented Database Systems 55
  • 56. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) What Compression Scheme To Use? Does column appear in the sort key? Is the average run-length > 2 Are number of unique values < ~50000 yes no Differential Encoding yes RLE Is the data numerical and exhibit good locality? yes Frame of Reference Encoding no Leave Data Uncompressed OR Heavyweight Compression yes Does this column appear frequently in selection predicates? yes Bit-vector Compression no Dictionary Compression no no VLDB 2009 Tutorial Column-Oriented Database Systems 56
  • 57. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 Heavy-Weight Compression Schemes l Modern disk arrays can achieve > 1GB/s l 1/3 CPU for decompression Ł 3GB/s needed Ł Lightweight compression schemes are better Ł Even better: operate directly on compressed data VLDB 2009 Tutorial Column-Oriented Database Systems 57
  • 58. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Integrating Compression and Execution in Column- Oriented Database Systems” Abadi et. al, SIGMOD ’06 Operating Directly on Compressed Data l I/O - CPU tradeoff is no longer a tradeoff l Reduces memory–CPU bandwidth requirements l Opens up possibility of operating on multiple records at once VLDB 2009 Tutorial Column-Oriented Database Systems 58
  • 59. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Integrating Compression and Execution in Column- Oriented Database Systems” Abadi et. al, SIGMOD ’06 Operating Directly on Compressed Data Quarter Product ID 1 … 1 0 0 1 1 0 0 1 0 0 0 … ProductID, COUNT(*)) (1, 3) (2, 1) (3, 2) 2 … 0 0 1 0 0 0 1 0 0 1 0 … 3 … 0 1 0 0 0 1 0 0 1 0 1 … (Q1, 1, 300) (Q2, 301, 6) (Q3, 307, 500) (Q4, 807, 600) 301-306 Index Lookup + Offset jump SELECT ProductID, Count(*) FROM table WHERE (Quarter = Q2) GROUP BY ProductID VLDB 2009 Tutorial Column-Oriented Database Systems 59
  • 60. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Integrating Compression and Execution in Column- Oriented Database Systems” Abadi et. al, SIGMOD ’06 Operating Directly on Compressed Data SELECT ProductID, Count(*) FROM table WHERE (Quarter = Q2) GROUP BY ProductID (Q1, 1, 300) (Q2, 301, 6) (Q3, 307, 500) (Q4, 807, 600) Block API Data isOneValue() isValueSorted() isPosContiguous() isSparse() getNext() decompressIntoArray() getValueAtPosition(pos) getMin() getMax() getSize() Aggregation Operator Selection Operator Compression- Aware Scan Operator VLDB 2009 Tutorial Column-Oriented Database Systems 60
  • 61. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Column-Oriented Database Systems Tuple Materialization and Column-Oriented Join Algorithms VLDB 2009 Tutorial “Materialization Strategies in a Column- Oriented DBMS” Abadi, Myers, DeWitt, and Madden. ICDE 2007. “Query Processing Techniques for Solid State Drives” Tsirogiannis, Harizopoulos Shah, Wiener, and Graefe. SIGMOD 2009. “Column-Stores vs Row-Stores: How Different are They Really?” Abadi, Madden, and Hachem. SIGMOD 2008. “Cache-Conscious Radix-Decluster Projections”, Manegold, Boncz, Nes, VLDB’04 “Self-organizing tuple reconstruction in column-stores“, Idreos, Manegold, Kersten, SIGMOD’09
  • 62. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) When should columns be projected? l Where should column projection operators be placed in a query plan? l Row-store: l Column projection involves removing unneeded columns from tuples l Generally done as early as possible l Column-store: l Operation is almost completely opposite from a row-store l Column projection involves reading needed columns from storage and extracting values for a listed set of tuples § This process is called “materialization” l Early materialization: project columns at beginning of query plan § Straightforward since there is a one-to-one mapping across columns l Late materialization: wait as long as possible for projecting columns § More complicated since selection and join operators on one column obfuscates mapping to other columns from same table l Most column-stores construct tuples and column projection time § Many database interfaces expect output in regular tuples (rows) § Rest of discussion will focus on this case VLDB 2009 Tutorial Column-Oriented Database Systems 62
  • 63. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) When should tuples be constructed? Select + Aggregate QUERY: SELECT custID,SUM(price) FROM table WHERE (prodID = 4) AND (storeID = 1) AND GROUP BY custID l Solution 1: Create rows first (EM). But: l Need to construct ALL tuples l Need to decompress data l Poor memory bandwidth utilization 2 3 3 3 7 13 42 80 2 1 3 1 Construct 2 1 3 1 2 3 3 3 7 13 42 80 (4,1,4) 4 4 4 4 prodID storeID custID price VLDB 2009 Tutorial Column-Oriented Database Systems 63
  • 64. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Solution 2: Operate on columns AGG AND Data Source custID Data Source prodID Data Source price Data Source storeID 1 1 1 1 Data Source 4 4 4 4 0 1 0 1 Data Source 2 1 3 1 prodID storeID QUERY: SELECT custID,SUM(price) FROM table WHERE (prodID = 4) AND (storeID = 1) AND GROUP BY custID 2 1 3 1 2 3 3 3 7 13 42 80 4 4 4 4 prodID storeID custID price VLDB 2009 Tutorial Column-Oriented Database Systems 64
  • 65. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Solution 2: Operate on columns 1 1 1 1 0 1 0 1 AND 0 1 0 1 QUERY: SELECT custID,SUM(price) FROM table WHERE (prodID = 4) AND (storeID = 1) AND GROUP BY custID AGG AND Data Source custID Data Source price 2 1 3 1 2 3 3 3 7 13 42 80 4 4 4 4 prodID storeID custID price Data Source prodID Data Source storeID VLDB 2009 Tutorial Column-Oriented Database Systems 65
  • 66. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Solution 2: Operate on columns 3 3 custID price 0 1 0 1 Data Source 1 13 1 80 Data Source 7 0 1 13 0 42 1 80 0 2 3 1 3 0 1 3 QUERY: SELECT custID,SUM(price) FROM table WHERE (prodID = 4) AND (storeID = 1) AND GROUP BY custID AGG AND Data Source custID Data Source price 2 1 3 1 2 3 3 3 7 13 42 80 4 4 4 4 prodID storeID custID price Data Source prodID Data Source storeID VLDB 2009 Tutorial Column-Oriented Database Systems 66
  • 67. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Solution 2: Operate on columns 1 3 1 3 1 13 1 80 13 1 93 AGG QUERY: SELECT custID,SUM(price) FROM table WHERE (prodID = 4) AND (storeID = 1) AND GROUP BY custID AGG AND Data Source custID Data Source price 2 1 3 1 2 3 3 3 7 13 42 80 4 4 4 4 prodID storeID custID price Data Source prodID Data Source storeID VLDB 2009 Tutorial Column-Oriented Database Systems 67
  • 68. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Materialization Strategies in a Column-Oriented DBMS” Abadi, Myers, DeWitt, and Madden. ICDE 2007. For plans without joins, late materialization is a win QUERY: SELECT C1, SUM(C2) FROM table WHERE (C1 < CONST) AND (C2 < CONST) GROUP BY C1 l Ran on 2 compressed columns from TPC-H scale 10 data 10 9 8 7 6 5 4 3 2 1 0 Low selectivity Medium selectivity High selectivity VLDB 2009 Tutorial Column-Oriented Database Systems 68 Time (seconds) Early Materialization Late Materialization
  • 69. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Materialization Strategies in a Column-Oriented DBMS” Abadi, Myers, DeWitt, and Madden. ICDE 2007. Even on uncompressed data, late materialization is still a win QUERY: SELECT C1, SUM(C2) FROM table WHERE (C1 < CONST) AND (C2 < CONST) GROUP BY C1 l Materializing late still works best 14 12 10 8 6 4 2 0 Low selectivity Medium selectivity High selectivity Time (seconds) Early Materialization Late Materialization VLDB 2009 Tutorial Column-Oriented Database Systems 69
  • 70. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) What about for plans with joins? B, C, E, H, F B, C, E, G, H Join 1 A = D Join 2 G = K A, B, C D, E, G, H Scan Scan Scan R3 (F, K, L) B, C F G K Project Join A = D B, C, E, H, F Project A D Join G = K Scan Scan Scan R3 (F, K, L) G VLDB 2009 Tutorial Column-Oriented Database Systems 70 70 Select R1.B, R1.C, R2.E, R2.H, R3.F From R1, R2, R3 Where R1.A = R2.D AND R2.G = R3.K R1 (A, B, C) R2 (D, E, G, H) F, K R1 (A, B, C) R2 (D, E, G, H) E, H
  • 71. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) What about for plans with joins? B, C, E, H, F B, C, E, G, H Join 1 A = D Join 2 G = K A, B, C D, E, G, H Scan Scan Scan R3 (F, K, L) B, C F G K Project Join A = D B, C, E, H, F Project A D Join G = K Scan Scan Scan R3 (F, K, L) G VLDB 2009 Tutorial Column-Oriented Database Systems 71 71 Select R1.B, R1.C, R2.E, R2.H, R3.F From R1, R2, R3 Where R1.A = R2.D AND R2.G = R3.K R1 (A, B, C) R2 (D, E, G, H) F, K R1 (A, B, C) R2 (D, E, G, H) E, H Early materialization Late materialization
  • 72. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Early Materialization Example 12 1 11 1 2 3 3 3 7 13 42 80 Construct 6 1 2 1 2 3 3 3 7 13 42 80 (4,1,4) prodID storeID quantity custID price 1 2 3 Green White Brown Construct 1 2 3 Green White Brown custID lastName QUERY: SELECT C.lastName,SUM(F.price) FROM facts AS F, customers AS C WHERE F.custID = C.custID GROUP BY C.lastName Facts Customers VLDB 2009 Tutorial Column-Oriented Database Systems 72
  • 73. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Early Materialization Example 2 3 3 3 7 13 42 80 White Brown Brown Brown 1 2 3 Green White Brown QUERY: SELECT C.lastName,SUM(F.price) FROM facts AS F, customers AS C WHERE F.custID = C.custID GROUP BY C.lastName 7 13 42 80 Join VLDB 2009 Tutorial Column-Oriented Database Systems 73
  • 74. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Late Materialization Example 12 1 11 1 6 1 2 1 2 3 1 3 Join 1 2 3 4 7 13 42 80 (4,1,4) prodID storeID quantity custID price 1 2 3 Green White Brown 2 3 1 3 custID lastName QUERY: SELECT C.lastName,SUM(F.price) FROM facts AS F, customers AS C WHERE F.custID = C.custID GROUP BY C.lastName Facts Customers Late materialized join causes out of order probing of projected columns from the inner relation VLDB 2009 Tutorial Column-Oriented Database Systems 74
  • 75. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Late Materialized Join Performance l Naïve LM join about 2X slower than EM join on typical queries (due to random I/O) l This number is very dependent on l Amount of memory available l Number of projected attributes l Join cardinality l But we can do better l Invisible Join l Jive/Flash Join l Radix cluster/decluster join VLDB 2009 Tutorial Column-Oriented Database Systems 75
  • 76. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Invisible Join “Column-Stores vs Row-Stores: How Different are They Really?” Abadi, Madden, and Hachem. SIGMOD 2008. l Designed for typical joins when data is modeled using a star schema l One (“fact”) table is joined with multiple dimension tables l Typical query: select c_nation, s_nation, d_year, sum(lo_revenue) as revenue from customer, lineorder, supplier, date where lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_orderdate = d_datekey and c_region = 'ASIA‘ and s_region = 'ASIA‘ and d_year >= 1992 and d_year <= 1997 group by c_nation, s_nation, d_year order by d_year asc, revenue desc; VLDB 2009 Tutorial Column-Oriented Database Systems 76
  • 77. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Invisible Join “Column-Stores vs Row-Stores: How Different are They Really?” Abadi, Madden, and Hachem. SIGMOD 2008. Apply “region = ‘Asia’” On Customer Table custkey region nation … 1 ASIA CHINA … 2 ASIA INDIA … 3 ASIA INDIA … Hash Table (or bit-map) Containing Keys 1, 2 and 3 4 EUROPE FRANCE … Apply “region = ‘Asia’” On Supplier Table suppkey region nation … 1 ASIA RUSSIA … 2 EUROPE SPAIN … Hash Table (or bit-map) Containing Keys 1, 3 3 ASIA JAPAN … Apply “year in [1992,1997]” On Date Table dateid year … 01011997 1997 … 01021997 1997 … 01031997 1997 … Hash Table Containing Keys 01011997, 01021997, and 01031997 VLDB 2009 Tutorial Column-Oriented Database Systems 77
  • 78. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Column-Stores vs Row-Stores: How Different are They Really?” Abadi et. al. SIGMOD 2008. Original Fact Table custkey suppkey orderkey orderdate revenue 1 3 1 01011997 43256 2 3 2 01011997 33333 3 4 3 01021997 12121 4 1 1 01021997 23233 5 4 2 01021997 45456 6 1 2 01031997 43251 7 3 2 01031997 34235 Hash Table Containing Keys 1, 2 and 3 + custkey 3 3 4 1 4 1 3 1 1 0 1 0 1 1 Hash Table Containing Keys 1 and 3 1 0 1 1 0 0 0 + suppkey 1 2 3 1 2 2 2 1 0 1 1 0 0 0 1 1 1 1 1 1 1 & & = Hash Table Containing Keys 01011997, 01021997, and 01031997 1 0 0 1 0 0 0 1 1 1 1 1 1 1 + orderdate 01011997 01011997 01021997 01021997 01021997 01031997 01031997 1 1 0 1 0 1 1 VLDB 2009 Tutorial Column-Oriented Database Systems 78
  • 79. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) custkey 3 3 4 1 4 1 3 + suppkey 1 2 3 1 2 2 2 orderdate 01011997 01011997 01021997 01021997 01021997 01031997 01031997 1 0 0 1 0 0 0 3 1 1 0 0 1 0 0 0 1 1 1 0 0 1 0 0 0 01011997 01021997 + + + + FRANCE RUSSIA SPAIN = RUSSIA JOIN CHINA INDIA INDIA = CHINA INDIA RUSSIA JAPAN 01011997 1997 01021997 1997 01031997 1997 = 1997 1997 VLDB 2009 Tutorial Column-Oriented Database Systems 79
  • 80. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Invisible Join “Column-Stores vs Row-Stores: How Different are They Really?” Abadi, Madden, and Hachem. SIGMOD 2008. Apply “region = ‘Asia’” On Customer Table custkey region nation … 1 ASIA CHINA … 2 ASIA INDIA … 3 ASIA INDIA … Hash Table (or bit-map) Containing Keys 1, 2 and 3 4 EUROPE FRANCE … Apply “region = ‘Asia’” On Supplier Table suppkey region nation … 1 ASIA RUSSIA … 2 EUROPE SPAIN … Range [1-3] (between-predicate rewriting) Hash Table (or bit-map) Containing Keys 1, 3 3 ASIA JAPAN … Apply “year in [1992,1997]” On Date Table dateid year … 01011997 1997 … 01021997 1997 … 01031997 1997 … Hash Table Containing Keys 01011997, 01021997, and 01031997 VLDB 2009 Tutorial Column-Oriented Database Systems 80
  • 81. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Invisible Join l Bottom Line l Many data warehouses model data using star/snowflake schemes l Joins of one (fact) table with many dimension tables is common l Invisible join takes advantage of this by making sure that the table that can be accessed in position order is the fact table for each join l Position lists from the fact table are then intersected (in position order) l This reduces the amount of data that must be accessed out of order from the dimension tables l “Between-predicate rewriting” trick not relevant for this discussion VLDB 2009 Tutorial Column-Oriented Database Systems 81
  • 82. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) custkey 3 3 4 1 4 1 3 + suppkey 1 2 3 1 2 2 2 orderdate 01011997 01011997 01021997 01021997 01021997 01031997 01031997 1 0 0 1 0 0 0 3 1 1 0 0 1 0 0 0 1 1 1 0 0 1 0 0 0 01011997 01021997 + + + + FRANCE Still accessing table out of order RUSSIA SPAIN = RUSSIA JOIN CHINA INDIA INDIA = CHINA INDIA RUSSIA JAPAN 01011997 1997 01021997 1997 01031997 1997 = 1997 1997 VLDB 2009 Tutorial Column-Oriented Database Systems 82
  • 83. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Jive/Flash Join 3 1 + CHINA INDIA INDIA = CHINA INDIA FRANCE Still accessing table out of order “Fast Joins using Join Indices”. Li and Ross, VLDBJ 8:1-24, 1999. “Query Processing Techniques for Solid State Drives”. Tsirogiannis, Harizopoulos et. al. SIGMOD 2009. VLDB 2009 Tutorial Column-Oriented Database Systems 83
  • 84. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Jive/Flash Join 1. Add column with dense ascending integers from 1 2. Sort new position list by second column 3. Probe projected column in order using new sorted position list, keeping first column from position list around 4. Sort new result by first column 3 1 + CHINA INDIA INDIA = CHINA INDIA FRANCE 1 2 2 1 CHINA 1 3 + INDIA INDIA = INDIA CHINA FRANCE 2 1 1 INDIA 2 CHINA VLDB 2009 Tutorial Column-Oriented Database Systems 84
  • 85. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Jive/Flash Join l Bottom Line l Instead of probing projected columns from inner table out of order: l Sort join index l Probe projected columns in order l Sort result using an added column l LM vs EM tradeoffs: l LM has the extra sorts (EM accesses all columns in order) l LM only has to fit join columns into memory (EM needs join columns and all projected columns) § Results in big memory and CPU savings (see part 3 for why there is CPU savings) l LM only has to materialize relevant columns l In many cases LM advantages outweigh disadvantages l LM would be a clear winner if not for those pesky sorts … can we do better? VLDB 2009 Tutorial Column-Oriented Database Systems 85
  • 86. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Radix Cluster/Decluster l The full sort from the Jive join is actually overkill l We just want to access the storage blocks in order (we don’t mind random access within a block) l So do a radix sort and stop early l By stopping early, data within each block is accessed out of order, but in the order specified in the original join index l Use this pseudo-order to accelerate the post-probe sort as well “Cache-Conscious Radix-Decluster Projections”, Manegold, Boncz, Nes, VLDB’04 •“Database Architecture Optimized for the New Bottleneck: Memory Access” VLDB’99 •“Generic Database Cost Models for Hierarchical Memory Systems”, VLDB’02 (all Manegold, Boncz, Kersten) VLDB 2009 Tutorial Column-Oriented Database Systems 86
  • 87. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Radix Cluster/Decluster l Bottom line l Both sorts from the Jive join can be significantly reduced in overhead l Only been tested when there is sufficient memory for the entire join index to be stored three times l Technique is likely applicable to larger join indexes, but utility will go down a little l Only works if random access within a storage block l Don’t want to use radix cluster/decluster if you have variable-width column values or compression schemes that can only be decompressed starting from the beginning of the block VLDB 2009 Tutorial Column-Oriented Database Systems 87
  • 88. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) LM vs EM joins l Invisible, Jive, Flash, Cluster, Decluster techniques contain a bag of tricks to improve LM joins l Research papers show that LM joins become 2X faster than EM joins (instead of 2X slower) for a wide array of query types VLDB 2009 Tutorial Column-Oriented Database Systems 88
  • 89. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Tuple Construction Heuristics l For queries with selective predicates, aggregations, or compressed data, use late materialization l For joins: l Research papers: l Always use late materialization l Commercial systems: l Inner table to a join often materialized before join (reduces system complexity): l Some systems will use LM only if columns from inner table can fit entirely in memory VLDB 2009 Tutorial Column-Oriented Database Systems 89
  • 90. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Outline l Computational Efficiency of DB on modern hardware l how column-stores can help here l Keynote revisited: MonetDB & VectorWise in more depth l CPU efficient column compression l vectorized decompression l Conclusions l future work VLDB 2009 Tutorial Column-Oriented Database Systems 90
  • 91. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Column-Oriented Database Systems 40 years of hardware evolution vs. DBMS computational efficiency VLDB 2009 Tutorial
  • 92. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) CPU Architecture Elements: l Storage l CPU caches L1/L2/L3 l Registers l Execution Unit(s) l Pipelined l SIMD VLDB 2009 Tutorial Column-Oriented Database Systems 92
  • 93. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) CPU Metrics VLDB 2009 Tutorial Column-Oriented Database Systems 93
  • 94. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) CPU Metrics VLDB 2009 Tutorial Column-Oriented Database Systems 94
  • 95. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) DRAM Metrics VLDB 2009 Tutorial Column-Oriented Database Systems 95
  • 96. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Super-Scalar Execution (pipelining) VLDB 2009 Tutorial Column-Oriented Database Systems 96
  • 97. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Hazards l Data hazards l Dependencies between instructions l L1 data cache misses Result: bubbles in the pipeline l Control Hazards l Branch mispredictions l Computed branches (late binding) l L1 instruction cache misses Out-of-order execution addresses data hazards l control hazards typically more expensive VLDB 2009 Tutorial Column-Oriented Database Systems 97
  • 98. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) SIMD l Single Instruction Multiple Data l Same operation applied on a vector of values l MMX: 64 bits, SSE: 128bits, AVX: 256bits l SSE, e.g. multiply 8 short integers VLDB 2009 Tutorial Column-Oriented Database Systems 98
  • 99. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) A Look at the Query Pipeline 350 377 *– 5300 7 PROJECT 2327 > 30 ? FTARLUSEE SELECT SCAN 102 ivan 350 102 ivan 37 102 101 ivan alice 22 37 102 ivan 37 101 alice 22 next() next() next() SELECT id, name (age-30)*50 AS bonus FROM employee WHERE age > 30 VLDB 2009 Tutorial Column-Oriented Database Systems 99
  • 100. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) A Look at the Query Pipeline 377 *– 5300 PROJECT 2327 > 30 ? SELECT SCAN next() next() next() 102 ivan 350 Operators Iterator interface -open() -next(): tuple -close() VLDB 2009 Tutorial Column-Oriented Database Systems 100
  • 101. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) A Look at the Query Pipeline 350 377 *– 5300 7 PROJECT 2327 > 30 ? FTARLUSEE SELECT SCAN 102 ivan 350 102 ivan 37 102 101 ivan alice 22 37 102 ivan 37 101 alice 22 next() next() next() Primitives Provide computational functionality All arithmetic allowed in expressions, e.g. Multiplication 7 * 50 mult(int,int) Ł int VLDB 2009 Tutorial Column-Oriented Database Systems 101
  • 102. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Database Architecture causes Hazards l Control Hazards l Branch mispredictions l Computed branches (late binding) l L1 instruction cache misses Operators Iterator interface -open() -next(): tuple l Dependencies between instructions SIMD Out-of-order Execution work on one tuple at a time PROJECT SELECT SCAN -close() l Data hazards l L1 data cache misses “DBMSs On A Modern Processor: Where Does Time Go? ” Ailamaki, DeWitt, Hill, Wood, VLDB’99 Complex NSM record navigation Large Tree/Hash Structures Code footprint of all operators in query plan exceeds L1 cache Data-dependent conditions Next() late binding method calls Tree, List, Hash traversal VLDB 2009 Tutorial Column-Oriented Database Systems 102
  • 103. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) DBMS Computational Efficiency TPC-H 1GB, query 1 l selects 98% of fact table, computes net prices and aggregates all l Results: l C program: ? l MySQL: 26.2s l DBMS “X”: 28.1s “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 VLDB 2009 Tutorial Column-Oriented Database Systems 103
  • 104. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) DBMS Computational Efficiency TPC-H 1GB, query 1 l selects 98% of fact table, computes net prices and aggregates all l Results: l C program: 0.2s l MySQL: 26.2s l DBMS “X”: 28.1s “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 VLDB 2009 Tutorial Column-Oriented Database Systems 104
  • 105. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Column-Oriented Database Systems MonetDB VLDB 2009 Tutorial
  • 106. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) a column-store l “save disk I/O when scan-intensive queries need a few columns” l “avoid an expression interpreter to improve computational efficiency” VLDB 2009 Tutorial Column-Oriented Database Systems 106
  • 107. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) RISC Database Algebra SELECT id, name, (age-30)*50 as bonus FROM people WHERE age > 30 VLDB 2009 Tutorial Column-Oriented Database Systems 107
  • 108. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) RISC Database Algebra CPU happy? Give it “nice” code ! - few dependencies (control,data) - CPU gets out-of-order execution - compiler can e.g. generate SIMD One loop for an entire column - no per-tuple interpretation - arrays: no record navigation - better instruction cache locality batcalc_minus_int(int* res, int* col, int val, int n) SELECT id, name, (age-30)*50 as bonus FROM people WHERE age > 30 VLDB 2009 Tutorial Column-Oriented Database Systems 108 { for(i=0; i<n; i++) res[i] = col[i] - val; } Simple, hard-coded semantics in operators
  • 109. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) RISC Database Algebra CPU happy? Give it “nice” code ! - few dependencies (control,data) - CPU gets out-of-order execution - compiler can e.g. generate SIMD One loop for an entire column - no per-tuple interpretation - arrays: no record navigation - better instruction cache locality batcalc_minus_int(int* res, int* col, int val, int n) SELECT id, name, (age-30)*50 as bonus FROM people WHERE age > 30 VLDB 2009 Tutorial Column-Oriented Database Systems 109 { for(i=0; i<n; i++) res[i] = col[i] - val; } MATERIALIZED intermediate results
  • 110. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) a column-store l “save disk I/O when scan-intensive queries need a few columns” l “avoid an expression interpreter to improve computational efficiency” How? l RISC query algebra: hard-coded semantics l Decompose complex expressions in multiple operations l Operators only handle simple arrays l No code that handles slotted buffered record layout l Relational algebra becomes array manipulation language l Often SIMD for free l Plus: use of cache-conscious algorithms for Sort/Aggr/Join VLDB 2009 Tutorial Column-Oriented Database Systems 110
  • 111. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) a Faustian pact l You want efficiency l Simple hard-coded operators l I take scalability l Result materialization n C program: 0.2s n MonetDB: 3.7s n MySQL: 26.2s n DBMS “X”: 28.1s VLDB 2009 Tutorial Column-Oriented Database Systems 111
  • 112. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Column-Oriented Database Systems as a research platform VLDB 2009 Tutorial
  • 113. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) SIGMOD 1985 MonetDB supports SQL, XML, ODMG, ..RDF RDF support on C-STORE / SW-Store MonetDB BAT Algebra •“MIL Primitives for Querying a Fragmented World”, Boncz, Kersten, VLDBJ’98 •“Flattening an Object Algebra to Provide Performance” Boncz, Wilschut, Kersten, ICDE’98 •“MonetDB/XQuery: a fast XQuery processor powered by a relational engine” Boncz, Grust, vanKeulen, Rittinger, Teubner, SIGMOD’06 •“SW-Store: a vertically partitioned DBMS for Semantic Web data management“ Abadi, Marcus, Madden, Hollenbach, VLDBJ’09 VLDB 2009 Tutorial Column-Oriented Database Systems 113
  • 114. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) The Software Stack Extensible query lang frontend framework Exte.n. sSiGbLle? Dynamic Runtime QOPT Framework! XQuery SQL 03 Optimizers MonetDB 4 MonetDB 5 MonetDB kernel RDF SOAP Arrays OGIS compile Extensible Architecture-Conscious Execution platform VLDB 2009 Tutorial Column-Oriented Database Systems 114
  • 115. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) optimizer frontend backend as a research platform l Cache-Conscious Joins l Cost Models, Radix-cluster, Radix-decluster l MonetDB/XQuery: l structural joins exploiting positional column access l Cracking: •“Database Architecture Optimized for the New Bottleneck: Memory Access” VLDB’99 •“Generic Database Cost Models for Hierarchical Memory Systems”, VLDB’02 (all Manegold, Boncz, Kersten) •“Cache-Conscious Radix-Decluster Projections”, Manegold, Boncz, Nes, VLDB’04 l on-the-fly automatic indexing without workload knowledge l Recycling: “MonetDB/XQuery: a fast XQuery processor powered by a relational engine” Boncz, Grust, vanKeulen, Rittinger, Teubner, SIGMOD’06 “Database Cracking”, CIDR’07 “Updating a cracked database “, SIGMOD’07 “Self-organizing tuple reconstruction in column-stores“, l using materialized intermediates l Run-time Query Optimization: l correlation-aware run-time optimization without cost model SIGMOD’09 (all Idreos, Manegold, Kersten) “An architecture for recycling intermediates in a column-store”, Ivanova, Kersten, Nes, Goncalves, SIGMOD’09 “ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09 VLDB 2009 Tutorial Column-Oriented Database Systems 115
  • 116. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Column-Oriented Database Systems “MonetDB/X100” vectorized query processing VLDB 2009 Tutorial
  • 117. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) MonetDB spin-off: MonetDB/X100 Materialization vs Pipelining VLDB 2009 Tutorial Column-Oriented Database Systems 117
  • 118. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) PROJECT SELECT SCAN next() next() next() 101 102 104 105 alice ivan peggy victor 22 37 45 25 “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 VLDB 2009 Tutorial Column-Oriented Database Systems 118
  • 119. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) PROJECT > 30 ? FALSE TRUE TRUE FALSE SELECT SCAN next() next() next() 101 102 104 105 101 102 104 105 alice ivan peggy victor alice ivan peggy victor 22 37 45 25 22 37 45 25 “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 VLDB 2009 Tutorial Column-Oriented Database Systems 119
  • 120. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) - 30 * 50 350 750 PROJECT > 30 ? FALSE TRUE TRUE FALSE SELECT SCAN next() “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 CPU cache Resident next() next() 102 104 102 104 101 102 104 105 101 102 104 105 ivan peggy ivan peggy alice ivan peggy victor alice ivan peggy victor 350 750 37 45 22 37 45 25 22 37 45 25 7 15 “Vectorized In Cache Processing” Observations: next() called much less often Ł more vector time = spent array in of ~100 primitives less in overhead processed in a tight loop primitive calls process an array of values in a loop: VLDB 2009 Tutorial Column-Oriented Database Systems 120
  • 121. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) - 30 * 50 350 750 PROJECT > 30 ? FALSE TRUE TRUE FALSE SELECT SCAN next() next() next() 102 104 102 104 101 102 104 105 101 102 104 105 ivan peggy ivan peggy alice ivan peggy victor alice ivan peggy victor 350 750 37 45 22 37 45 25 22 37 45 25 7 15 “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 Observations: next() called much less often Ł more time spent in primitives less in overhead primitive calls process an array of values in a loop: CPU Efficiency depends on “nice” code - out-of-order execution - few dependencies (control,data) - compiler support Compilers like simple loops over arrays - loop-pipelining - automatic SIMD VLDB 2009 Tutorial Column-Oriented Database Systems 121
  • 122. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) for(i=0; i<n; i++) res[i] = (col[i] > x) * 50 350 750 PROJECT for(i=0; i<n; i++) > 30 ? FALSE TRUE TRUE FALSE res[i] = (col[i] - x) SELECT for(i=0; i<n; i++) res[i] = (col[i] * x) SCAN “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 Observations: next() called much less often Ł more time spent in primitives less in overhead primitive calls process an array of values in a loop: CPU Efficiency depends on “nice” code - out-of-order execution - few dependencies (control,data) - compiler support Compilers like simple loops over arrays - loop-pipelining - automatic SIMD > 30 ? FALSE TRUE TRUE FALSE - 30 7 15 * 50 350 750 VLDB 2009 Tutorial Column-Oriented Database Systems 122
  • 123. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) PROJECT > 30 ? SELECT SCAN next() next() 101 102 104 105 alice ivan peggy victor 22 37 45 25 1 2 “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 Tricks being played: - Late materialization - Materialization avoidance using selection vectors VLDB 2009 Tutorial Column-Oriented Database Systems 123
  • 124. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) - 30 * 50 350 750 PROJECT > 30 ? SELECT SCAN next() next() next() 101 102 104 105 alice ivan peggy victor 22 37 45 25 7 15 1 2 “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 map_mul_flt_val_flt_col( float *res, int* sel, float val, float *col, int n) Tricks being played: - Late materialization - Materialization avoidance using selection vectors VLDB 2009 Tutorial Column-Oriented Database Systems 124 { for(int i=0; i<n; i++) res[i] = val * col[sel[i]]; } selection vectors used to reduce vector copying contain selected positions
  • 125. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) - 30 * 50 350 750 PROJECT > 30 ? SELECT SCAN next() next() next() 101 102 104 105 alice ivan peggy victor 22 37 45 25 7 15 1 2 “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 map_mul_flt_val_flt_col( float *res, int* sel, float val, float *col, int n) Tricks being played: - Late materialization - Materialization avoidance using selection vectors VLDB 2009 Tutorial Column-Oriented Database Systems 125 { for(int i=0; i<n; i++) res[i] = val * col[sel[i]]; } selection vectors used to reduce vector copying contain selected positions
  • 126. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) MonetDB/X100 l Both efficiency “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 l Vectorized primitives l and scalability.. l Pipelined query evaluation n C program: 0.2s n MonetDB/X100: 0.6s n MonetDB: 3.7s n MySQL: 26.2s n DBMS “X”: 28.1s VLDB 2009 Tutorial Column-Oriented Database Systems 126
  • 127. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Memory Hierarchy “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 X100 query engine ColumnBM (buffer manager) CPU cache (raid) Disk(s) RAM VLDB 2009 Tutorial Column-Oriented Database Systems 127
  • 128. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Memory Hierarchy “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 Vectors are only the in-cache representation RAM & disk representation might actually be different (vectorwise uses both PAX & DSM) X100 query engine ColumnBM (buffer manager) CPU cache (raid) Disk(s) RAM VLDB 2009 Tutorial Column-Oriented Database Systems 128
  • 129. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Optimal Vector size? “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 All vectors together should fit the CPU cache Optimizer should tune this, given the query characteristics. X100 query engine ColumnBM (buffer manager) CPU cache (raid) Disk(s) RAM VLDB 2009 Tutorial Column-Oriented Database Systems 129
  • 130. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 Varying the Vector size Less and less iterator.next() and primitive function calls (“interpretation overhead”) VLDB 2009 Tutorial Column-Oriented Database Systems 130
  • 131. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 Varying the Vector size Vectors start to exceed the CPU cache, causing additional memory traffic VLDB 2009 Tutorial Column-Oriented Database Systems 131
  • 132. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 Varying the Vector size The benefit of selection vectors VLDB 2009 Tutorial Column-Oriented Database Systems 132
  • 133. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 MonetDB/MIL materializes columns ColumnBM (buffer manager) CPU cache (raid) Disk(s) RAM VLDB 2009 Tutorial Column-Oriented Database Systems 133
  • 134. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05 Benefits of Vectorized Processing l 100x less Function Calls l iterator.next(), primitives l No Instruction Cache Misses l High locality in the primitives l Less Data Cache Misses “Buffering Database Operations for Enhanced Instruction Cache Performance” Zhou, Ross, SIGMOD’04 l Cache-conscious data placement l No Tuple Navigation “Block oriented processing of relational database operations in modern computer architectures” Padmanabhan, Malkemus, Agarwal, ICDE’01 l Primitives are record-oblivious, only see arrays l Vectorization allows algorithmic optimization l Move activities out of the loop (“strength reduction”) l Compiler-friendly function bodies l Loop-pipelining, automatic SIMD VLDB 2009 Tutorial Column-Oriented Database Systems 134
  • 135. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Balancing Vectorized Query Execution with Bandwidth Optimized Storage ” Zukowski, CWI 2008 Vectorizing Relational Operators l Project l Select l Exploit selectivities, test buffer overflow l Aggregation l Ordered, Hashed l Sort l Radix-sort nicely vectorizes l Join l Merge-join + Hash-join VLDB 2009 Tutorial Column-Oriented Database Systems 135
  • 136. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Column-Oriented Database Systems Efficient Column Store Compression VLDB 2009 Tutorial
  • 137. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Key Ingredients “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 l Compress relations on a per-column basis l Columns compress well l Decompress small vectors of tuples from a column into the CPU cache l Minimize main-memory overhead l Use light-weight, CPU-efficient algorithms l Exploit processing power of modern CPUs VLDB 2009 Tutorial Column-Oriented Database Systems 137
  • 138. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Key Ingredients “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 l Compress relations on a per-column basis l Columns compress well l Decompress small vectors of tuples from a column into the CPU cache l Minimize main-memory overhead VLDB 2009 Tutorial Column-Oriented Database Systems 138
  • 139. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Vectorized Decompression X100 query engine ColumnBM (buffer manager) CPU cache (raid) Disk(s) RAM Idea: decompress a vector only compression: -between CPU and RAM -Instead of disk and RAM (classic) VLDB 2009 Tutorial Column-Oriented Database Systems 139
  • 140. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 RAM-Cache Decompression l Decompress vectors on-demand into the cache l RAM-Cache boundary only crossed once l More (compressed) data cached in RAM l Less bandwidth use VLDB 2009 Tutorial Column-Oriented Database Systems 140
  • 141. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Multi-Core Bandwidth & Compression Performance Degradation with Concurrent Queries VLDB 2009 Tutorial Column-Oriented Database Systems 141 1.2 1 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 7 8 number of concurrent queries avg query speed (clock normalized) Intel® Xeon E5410 (Harpertown) - Q6b Normal Intel® Xeon E5410 (Harpertown) - Q6b Compressed Intel® Xeon X5560 (Nehalem) - Q6b Normal Intel® Xeon X5560 (Nehalem) - Q6b Compressed
  • 142. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 CPU Efficient Decompression l Decoding loop over cache-resident vectors of code words l Avoid control dependencies within decoding loop l no if-then-else constructs in loop body l Avoid data dependencies between loop iterations VLDB 2009 Tutorial Column-Oriented Database Systems 142
  • 143. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 Disk Block Layout l Forward growing section of arbitrary size code words (code word size fixed per block) VLDB 2009 Tutorial Column-Oriented Database Systems 143
  • 144. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 Disk Block Layout l Forward growing section of arbitrary size code words (code word size fixed per block) l Backwards growing exception list VLDB 2009 Tutorial Column-Oriented Database Systems 144
  • 145. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 Naïve Decompression Algorithm l Use reserved value from code word domain (MAXCODE) to mark exception positions VLDB 2009 Tutorial Column-Oriented Database Systems 145
  • 146. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 Deterioriation With Exception% l 1.2GB/s deteriorates to 0.4GB/s VLDB 2009 Tutorial Column-Oriented Database Systems 146
  • 147. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 Deterioriation With Exception% l Perf Counters: CPU mispredicts if-then-else VLDB 2009 Tutorial Column-Oriented Database Systems 147
  • 148. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Patching “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 l Maintain a patch-list through code word section that links exception positions VLDB 2009 Tutorial Column-Oriented Database Systems 148
  • 149. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Patching “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 l Maintain a patch-list through code word section that links exception positions l After decoding, patch up the exception positions with the correct values VLDB 2009 Tutorial Column-Oriented Database Systems 149
  • 150. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 Patched Decompression VLDB 2009 Tutorial Column-Oriented Database Systems 150
  • 151. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 Patched Decompression VLDB 2009 Tutorial Column-Oriented Database Systems 151
  • 152. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 Decompression Bandwidth Patching can be applied to: • Frame of Reference (PFOR) • Delta Compression (PFOR-DELTA) • Dictionary Compression (PDICT) Makes these methods much more applicable to noisy data Ł without additional cost l Patching makes two passes, but is faster! VLDB 2009 Tutorial Column-Oriented Database Systems 152
  • 153. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Column-Oriented Database Systems Conclusion VLDB 2009 Tutorial
  • 154. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Summary (1/2) l Columns and Row-Stores: different? l No fundamental differences l Can current row-stores simulate column-stores now? l not efficiently: row-stores need change l On disk layout vs execution layout l actually independent issues, on-the-fly conversion pays off l column favors sequential access, row random l Mixed Layout schemes l Fractured mirrors l PAX, Clotho l Data morphing VLDB 2009 Tutorial Column-Oriented Database Systems 154
  • 155. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Summary (2/2) l Crucial Columnar Techniques l Storage l Lean headers, sparse indices, fast positional access l Compression l Operating on compressed data l Lightweight, vectorized decompression l Late vs Early materialization l Non-join: LM always wins l Naïve/Invisible/Jive/Flash/Radix Join (LM often wins) l Execution l Vectorized in-cache execution l Exploiting SIMD VLDB 2009 Tutorial Column-Oriented Database Systems 155
  • 156. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Future Work l looking at write/load tradeoffs in column-stores l read-only vs batch loads vs trickle updates vs OLTP VLDB 2009 Tutorial Column-Oriented Database Systems 156
  • 157. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Updates (1/3) l Column-stores are update-in-place averse l In-place: I/O for each column l + re-compression l + multiple sorted replicas l + sparse tree indices Update-in-place is infeasible! VLDB 2009 Tutorial Column-Oriented Database Systems 157
  • 158. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Updates (2/3) l Column-stores use differential mechanisms instead l Differential lists/files or more advanced (e.g. PDTs) l Updates buffered in RAM, merged on each query l Checkpointing merges differences in bulk sequentially l I/O trends favor this anyway § trade RAM for converting random into sequential I/O § this trade is also needed in Flash (do not write randomly!) l How high loads can it sustain? § Depends on available RAM for buffering (how long until full) § Checkpoint must be done within that time § The longer it can run, the less it molests queries § Using Flash for buffering differences buys a lot of time § Hundreds of GBs of differences per server VLDB 2009 Tutorial Column-Oriented Database Systems 158
  • 159. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Updates (3/3) l Differential transactions favored by hardware trends l Snapshot semantics accepted by the user community l can always convert to serialized “Serializable Isolation For Snapshot Databases” Alomari, Cahill, Fekete, Roehm, SIGMOD’08 Ł Row stores could also use differential transactions and be efficient! Ł Implies a departure from ARIES Ł Implies a full rewrite My conclusion: a system that combines row- and columns needs differentially implemented transactions. Starting from a pure column-store, this is a limited add-on. Starting from a pure row-store, this implies a full rewrite. VLDB 2009 Tutorial Column-Oriented Database Systems 159
  • 160. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Future Work l looking at write/load tradeoffs in column-stores l read-only vs batch loads vs trickle updates vs OLTP l database design for column-stores l column-store specific optimizers l compression/materialization/join tricks Ł cost models? l hybrid column-row systems l can row-stores learn new column tricks? l Study of the minimal number changes one needs to make to a row store to get the majority of the benefits of a column-store l Alternative: add features to column-stores that make them more like row stores VLDB 2009 Tutorial Column-Oriented Database Systems 160
  • 161. Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Conclusion l Columnar techniques provide clear benefits for: l Data warehousing, BI l Information retrieval, graphs, e-science l A number of crucial techniques make them effective l Without these, existing row systems do not benefit l Row-Stores and column-stores could be combined l Row-stores may adopt some column-store techniques l Column-stores add row-store (or PAX) functionality l Many open issues to do research on! VLDB 2009 Tutorial Column-Oriented Database Systems 161