1.
1
C-Store: A Column-oriented DBMS
By
New England Database Group
2.
2
M.I.T
Current DBMS Gold Standard
Store fields in one record contiguously on disk
Use B-tree indexing
Use small (e.g. 4K) disk blocks
Align fields on byte or word boundaries
Conventional (row-oriented) query optimizer
and executor (technology from 1979)
Aries-style transactions
3.
3
M.I.T
Terminology -- “Row Store”
Record 2
Record 4
Record 1
Record 3
E.g. DB2, Oracle, Sybase, SQLServer, …
4.
4
M.I.T
Row Stores are Write Optimized
Can insert and delete a record in one physical
write
Good for on-line transaction processing (OLTP)
But not for read mostly applications
Data warehouses
CRM
5.
5
M.I.T
Elephants Have Extended Row Stores
With Bitmap indices
Better sequential read
Integration of “datacube” products
Materialized views
But there may be a better idea…….
7.
7
M.I.T
At 100K Feet….
Ad-hoc queries read 2 columns out of 20
In a very large warehouse, Fact table is
rarely clustered correctly
Column store reads 10% of what a row
store reads
8.
8
M.I.T
C-Store (Column Store) Project
Brandeis/Brown/MIT/UMass-Boston
project
Usual suspects participating
Enough coded to get performance
numbers for some queries
Complete status later
9.
9
M.I.T
We Build on Previous Pioneering
Work….
Sybase IQ (early ’90s)
Monet (see CIDR ’05 for the most recent
description)
10.
10
M.I.T
C-Store – a Hybrid Store
Read-optimized
Column store
Write-optimized
Column store
Tuple mover
(Much like Monet)
(What we have been
talking about so far)
(Batch rebuilder)
11.
11
M.I.T
C-Store Technical Ideas
Code the columns to save space
No alignment
Big disk blocks
Only materialized views (perhaps many)
Focus on Sorting not indexing
Automatic physical DBMS design
12.
12
M.I.T
C-store (Column Store) Technical
Ideas
Innovative redundancy
Redundant storage of elements of table in serveral
overlapping projections in diffrent orders
High availabilty, K-safety, overlapping projections
Xacts – but no need for Mohan
(Dr. C. Mohan? WAL based?)
avoid 2PC, no prepare, no locking
Data ordered on anything, Not just time
Column optimizer and executor
13.
13
M.I.T
Data Model
Projection
Segment
Storage Keys - tuple
Join indices
– (sid, storage key)
14.
14
M.I.T
Code the Columns
Work hard to shrink space
Use extra space for multiple orders
Fundamentally easier than in a row store
E.g. RLE works well
type 1, self order, few distinct values (v, f, n)
type 2, foreign order, few distinct values (v, b)
type 3, self order, many distinct values (delta)
type 4, foreign order, many distinct values (unencoded, or densepack B-tree)
15.
15
M.I.T
Different Indexing
Few values Many values
Sequential
RLE encoded
Conventional B-tree at
the value level
Delta encoded
Conventional B-tree at
the block level
Non sequential Bitmap per value
Conventional Gzip
Conventional B-tree at
the block level
16.
16
M.I.T
No Alignment
Densepack columns
E.g. a 5 bit field takes 5 bits
Current CPU speed going up faster than
disk bandwidth
Faster to shift data in CPU than to
waste disk bandwidth
17.
17
M.I.T
Big Disk Blocks
Tunable
Big (minimum size is 64K)
18.
18
M.I.T
Only Materialized Views
Projection (materialized view) is some
number of columns from a fact table
Plus columns in a dimension table – with
a 1-n join between Fact and Dimension
table
Stored in order of a storage key(s)
Several may be stored!!!!!
With a permutation, if necessary, to map
between them
19.
19
M.I.T
Only Materialized Views
Table (as the user specified it and sees
it) is not stored!
No secondary indexes (they are a one
column sorted MV plus a permutation, if
you really want one)
20.
20
M.I.T
Automatic Physical DBMS Design
Accept a “training set” of queries and a
space budget
Choose the MVs auto-magically
Re-optimize periodically based on a log
of the interactions
21.
21
M.I.T
Innovative Redundancy
Hardly any warehouse is recovered by a
redo from the log
Takes too long!
Store enough MVs at enough places to
ensure K-safety
Rebuild dead objects from elsewhere in
the network
K-safety is a DBMS-design problem!
22.
22
M.I.T
XACTS – No Mohan
Undo from a log (that does not need to
be persistent)
Redo by rebuild from elsewhere in the
network
23.
23
M.I.T
XACTS – No Mohan
Snapshot isolation (run queries as of a
tunable time in the recent past)
To solve read-write conflicts
Distributed Xacts
Without a prepare message (no 2
phase commit)
24.
24
M.I.T
Updates & Trans Machanism
ET - Effictive Time
IV - Insertion Time, for WS only
DRV - Deleted Record Vector,
for Projection
TA - Timestamp Authority
– HWM
– LWM: triggered by tuple
mover
25.
25
M.I.T
Recovery
K-Safty
Common Case - WS crash, RS ok
– recovery sql
– log when tlastmove(S)>tlastmove(Sr)
26.
26
M.I.T
Tuple Mover
MOP : Merge-Out Process
emit LWM
– The need of users who want historical access
– WS space constraints
27.
27
M.I.T
Column Executor
Column operations
not row operations
Columns remain coded
if possible
LM
Late materialization
28.
28
M.I.T
Column Optimizer
Chooses MVs on which to run the query
Most important task
Build in snowflake schemas
Which are simple to optimize without
exhaustive search
32.
32
M.I.T
Current Performance
100X popular row store in 40% of the
space
10X popular column store in 70% of the
space
7X popular row store in 1/6th of the space
Code available with BSD license
33.
33
M.I.T
Structure Going Forward
Vertica
Very well financed start-up to
commercialize C-store
Doing the heavy lifting
University Research
Funded by Vertica
34.
34
M.I.T
Vertica
Complete alpha system in December ‘05
Everything, including DBMS designer
With current performance!
Looking for early customers to work with
(see me if you are interested)
35.
35
M.I.T
University Research
Extension of algorithms to non-snowflake
schemas
Study of L2 cache performance
Study of coding strategies
Study of executor options
Study of recovery tactics
Non-cursor interface
Study of optimizer primitives
36.
36
M.I.T
Paper list
C-Store: A Column Oriented DBMS
– Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin,
Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. VLDB, pages 553-564, 2005.
Integrating Compression and Execution in Column-Oriented Database Systems
– Daniel J. Abadi, Samuel R. Madden, and Miguel C. Ferreira. Proceedings of SIGMOD, June, 2006, Chicago, USA.
Performance Tradeoffs in Read-Optimized Databases
– Stavros Harizopoulos, Velen Liang, Daniel Abadi, and Samuel Madden. Proceedings of VLDB, September, 2006, Seoul, Korea.
Column-Stores For Wide and Sparse Data
– Daniel J. Abadi. Proceedings of CIDR, January, 2007, Asilomar, USA.
Materialization Strategies in a Column-Oriented DBMS
– Daniel J. Abadi, Daniel S. Myers, David J. DeWitt, and Samuel R. Madden. Proceedings of ICDE, April, 2007, Istanbul, Turkey.
Scalable Semantic Web Data Management Using Vertical Partitioning
– Daniel J. Abadi, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. Proceedings of VLDB, September, 2007, Vienna,
Austria.
Column-Stores vs. Row-Stores: How Different Are They Really?
– Daniel J. Abadi, Samuel R. Madden, Nabil Hachem. In Proceedings of SIGMOD, 2008, Vancouver, Canada.