Reduced IO Fetches only needed columns from diskSELECT C2, SUM (C3) … C2 C3 Columns are compressed C1 C4 C5 C6 Less IO Better buffer hit rates
New query execution technology• Batch mode execution of some operations – processes rows in batches – groups of batch operations in query plan• Better parallelism, better algorithms
Dictionary-based compression Year of Code Birth 1996 1 Internal Dictionary 1975 15Year of 1948 50Birth 1932 58 On-the-fly build dictionary1996 … 60 with all distinct value.1975 Substitute non-selective values with ID.1975 Index in our example – 6 bits per row.1948 Year of Birth1932 Code 1 Compressed Fact… 15 15 50 58 60
SegmentsC1 C2 C3 C4 C5 C6 Column segment Set of about contains values from 1M rows one column for a set of about 1M rows Column segments are compressed Each column segment stored in separate LOB Column segment is Column unit of transfer from Segment disk
Best practices / worst practices• Best practices: – Put columnstore indexes on large tables only. – Include every column of the table in the columnstore index. – Structure your queries as star joins with grouping and aggregation as much as possible.• Worst practices: – Avoid JOIN and/or filter on string columns in the table with columnstore index. – Avoid OUTER JOIN, UNION ALL, IN/NOT IN. – Avoid JOIN between 2 Fact tables.