Improve Data Warehouse Query Performance up to 100x with SQL Server 2012 Columnstore Indexes

COLUMNSTORE INDEXES

SQL Server 2012

Denis Reznik
The Frayman Group
denisreznik@live.ru

Columnstore indexes
• Column Store vs. Row Store
• Columnstore benefits
• Columnstore indexes
• CS indexes Internals
• Adding data to Columnstore index

Row Store and Column Store

In row store, data is stored tuple by tuple.
In column store, data is stored column by column

3

name address

Most of the queries does not id city state age
process all the attributes of a
particular relation.

SELECT c.name, c.address
FROM Customers c
WHERE c.region = ‘Moskow'

4


Row Store Column Store

(+) Easy to add/modify a record (+) Only need to read in relevant data

(-) Might read in unnecessary data (-) Tuple writes require multiple accesses

So column stores are suitable for read-mostly, read-intensive,
large data repositories
5

Compression

Trades I/O for CPU
Higher data value locality in column stores
Techniques such as run length encoding far more useful
Schemes
Null Suppression
Dictionary encoding
Run Length encoding
Bit-Vector encoding
Heavyweight schemes

6

Columnar storage structure

C1 C2 C3 C4 C5 C6

Uses VertiPaq
compression

Accelerating Data Warehouse
Queries with SQL Server 2012
v 9
Columnstore Indexes

Improved Data Warehouse Query performance

Columnstore indexes provide an
easy way to significantly improve
data warehouse and decision
support query performance against
very large data sets
Performance improvements for
“typical” data warehouse queries
from 10x to 100x
Ideal candidates include queries
against star schemas that use
filtering, aggregations and grouping
against very large fact tables
10

Good Candidates for Columnstore
Indexing
Table candidates:
Very large fact tables (for example – billions of rows)
Larger dimension tables (millions of rows) with compression friendly column
data
If unsure, it is easy to create a columnstore index and test the impact on
your query workload
Query candidates (against table with a columnstore index):
Scan versus seek (columnstore indexes don’t support seek operations)
Aggregated results far smaller than table size
Joins to smaller dimension tables
Filtering on fact / dimension tables – star schema pattern
Sub-set of columns (being selective in columns versus returning ALL
columns) 11

Creating a columnstore index

T-SQL

SSMS

12

Defining the Columnstore Index

Base
OR
Columnstore index is nonclustered
table (secondary)
Clustered Heap
index Base table can be clustered index or heap
One CS index per table
Multiple other nonclustered (B-tree)
Nonclustered Nonclustered Nonclustered
index index columnstore indexes allowed
index
But may not be needed
CS index must be partition-aligned if table
is partitioned
Indexed Filtered
view index

segment 1
Column Segments and
Dictionaries
C1 C2 C3 C4 C5 C6

Set of about
1M rows

… dictionaries
segment N

Column
Segment

15

Memory management

• Memory management is automatic
• Columnstore is persisted on disk
• Needed columns fetched into memory
• Columnstore segments flow between disk and memory
SELECT C2,
SUM(C4)
T.C1 T.C2 T.C3 T.C4 FROM T T.C4
T.C2
T.C1
T.C3
GROUP BY C2;
T.C1 T.C2 T.C3 T.C4
T.C1 T.C4
T.C2
T.C3
T.C1 T.C3 T.C4
T.C2

16

Look inside Columnstore Indexes
v 17

Xvelocity
Microsoft SQL Server family of memory-optimized and
in-memory technologies
xVelocity In-Memory Analytics Engine
xVelocity Memory-Optimized Columnstore Indexes

The xVelocity engine is designed with 3 principles in
mind:
Performance, Performance, Performance! 18

How Are These Performance Gains
Achieved?
Two complimentary technologies:
Storage
Data is stored in a compressed columnar data format (stored
by column) instead of row store format (stored by row).
New “batch mode” execution
Vector-based query execution capability
Data can then be processed in batches versus row-by-row
Depending on filtering and other factors, a query may also
benefit by “segment elimination” - bypassing million row
chunks (segments) of data, further reducing I/O 19

Batch mode processing
Batch object
Process ~1000 rows at
Column vectors
a time
bitmap of qualifying rows

Vector operators
implemented
Greatly reduced CPU
time (7 to 40X)

Segment Elimination

select Date, count(*)
from dbo.Purchase
where Date >= 20120201
column_i group by Date
segment_id min_data_id max_data_id
d

1 1 20120101 20120131

1 2 20120115 20120215

1 3 20120201 20120228

Columnstore format + batch mode
Variations
Columnstore indexing alone + traditional row mode in
Query Processor
Columnstore indexing + batch mode in Query
Processor
Columnstore indexing + hybrid of batch and traditional
row mode in Query Processor

23

Plan operators supported in batch mode

Filter
Project
Scan
Local hash (partial) aggregation
Hash inner join
(Batch) hash table build
24

Query processing with
Columnstore Indexes
v 25

Maintaining Data in a Columnstore Index

Once built, the table becomes “read-only” and
INSERT/UPDATE/DELETE/MERGE is no longer
allowed
ALTER INDEX REBUILD / REORGANIZE not
allowed
How can I modify index data?
Drop columnstore index / make modifications / add
columnstore index
UNION ALL (but be sure to validate performance)
Partition switches (IN and OUT) 27

Insert data into table with
Columnstore Index v 28

Summary

SQL Server 2012 offers significantly faster query performance
for data warehouse and decision support scenarios
10x to 100x performance improvement depending on the schema
and query
I/O reduction and memory savings through columnstore compressed
storage
CPU reduction with batch versus row processing, further I/O reduction if
segmentation elimination occurs
Easy to deploy and requires less management than some legacy
ROLAP or OLAP methods
No need to create intermediate tables, aggregates, pre-processing and
cubes
Interoperability with partitioning
29

Resources

Columnar Storage in SQL Server 2012 (PDF)
SQL Server Columnstore Performance Tuning
Inside the SQL Server 2012 Columnstore Indexes
24 HOP Russia 2013 – Dmitry Pilyugin (video - rus)
SQL Server Columnstore Performance Tuning (video)

30

SQL SERVER 2012 - COLUMNSTORE INDEXES

Denis Reznik

Senior Database Architect at The Frayman Group

Microsoft SQL Server MVP

denisreznik@live.ru

@denisreznik

http://reznik.uneta.com.ua

Improve Data Warehouse Query Performance up to 100x with SQL Server 2012 Columnstore Indexes

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (13)

Similar to Improve Data Warehouse Query Performance up to 100x with SQL Server 2012 Columnstore Indexes

Similar to Improve Data Warehouse Query Performance up to 100x with SQL Server 2012 Columnstore Indexes (20)

More from Денис Резник

More from Денис Резник (19)

Improve Data Warehouse Query Performance up to 100x with SQL Server 2012 Columnstore Indexes