SQL Explore 2012 - Michael Zilberstein: ColumnStorePresentation Transcript
Boosting performance with Columnstore Indexes Michael Zilberstein DBArt LtdMichael@dbart.co.il
History• Column-oriented databases: – Sybase IQ – Vertica – Aster Data – Greenplum –…• Excel PowerPivot.• VertiPaq.• xVelocity Columnstore index.
C1 C2 C3 C4 C5 C6Uses VertiPaqcompression
Reduced IO Fetches only needed columns from diskSELECT C2, SUM (C3) … C2 C3 Columns are compressed C1 C4 C5 C6 Less IO Better buffer hit rates
New query execution technology• Batch mode execution of some operations – processes rows in batches – groups of batch operations in query plan• Better parallelism, better algorithms
Dictionary-based compression Year of Code Birth 1996 1 Internal Dictionary 1975 15Year of 1948 50Birth 1932 58 On-the-fly build dictionary1996 … 60 with all distinct value.1975 Substitute non-selective values with ID.1975 Index in our example – 6 bits per row.1948 Year of Birth1932 Code 1 Compressed Fact… 15 15 50 58 60
SegmentsC1 C2 C3 C4 C5 C6 Column segment Set of about contains values from 1M rows one column for a set of about 1M rows Column segments are compressed Each column segment stored in separate LOB Column segment is Column unit of transfer from Segment disk
Data Dictionary Views
New execution plan elements
Best practices / worst practices• Best practices: – Put columnstore indexes on large tables only. – Include every column of the table in the columnstore index. – Structure your queries as star joins with grouping and aggregation as much as possible.• Worst practices: – Avoid JOIN and/or filter on string columns in the table with columnstore index. – Avoid OUTER JOIN, UNION ALL, IN/NOT IN. – Avoid JOIN between 2 Fact tables.