Christian Winther Kristensen

APPLICATION OF SQL
SERVER COLUMNSTORE
INDEXES IN BI-SOLUTIONS
Temadag: Modern Analytical Database Technology
28. oktober 2014, Aalborg Universitet
Christian Winther Kristensen
Managing consultant
cwk@rehfeld.dk

Agenda
• SQL server columnstore index
• Practical case
• New updateable clustered
columnstore in SQL server 2014
• Comparison: Pros and cons
• Questions
03-11-2014

SQL server columnstore index
• Came in SQL server 2012
• Shares Microsoft xVelocity
columnstore technology with Analysis
Services Tabular model and
PowerPivot
• Highly compressed
• Memory optimized
• Not updateable
 underlying table is read only!
03-11-2014

Star schema
4
FactSales
DimCustomer
FactSales ( CustomerKey int
, ProductKey int
, EmployeeKey int
, StoreKey int
, OrderDateKey int
, SalesAmount money
)
‐‐note: lots of ints in fact tables
DimCustomer ( CustomerKey int
, FirstName nvarchar(50)
, LastName nvarchar(50)
, Birthdate date
, EmailAddress nvarchar(50)
)
DimProduct (…
Best Practice: Integer keys!
DimDate
DimEmployee
DimStore

How do columnstore indexes optimize
performance?
…
Columnstore indexes store data column-wise
 Each page stores data from a single column
 Highly compressed
 About 2x better than PAGE compression
 More data fits in memory
 Each column accessed independently
 Fetch only needed columns
 Can dramatically decrease I/O
C1 C2 C3 C4
Heaps, B-trees store data
row-wise

Columnstore index architecture
• Row Group
– 1 million logically contiguous rows
• Column Segment
– Segment contains values from one
column for a set of rows
– Segments for the same set of rows
comprise a row group
– Segments are compressed
– Each segment stored in a separate LOB
– Segment is unit of transfer between
disk and memory
Segment
C1 C2 C3 C4 C5 C6
Row
Group
6

Columnstore index example
OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount
20101107 106 01 1 6 30.00
20101107 103 04 2 1 17.00
20101107 109 04 2 2 20.00
20101107 103 03 2 1 17.00
20101107 106 05 3 4 20.00
20101108 106 02 1 5 25.00
20101108 102 02 1 1 14.00
20101108 106 03 2 5 25.00
20101108 109 01 1 1 10.00
20101109 106 04 2 4 20.00
20101109 106 04 2 5 25.00
20101109 103 01 1 1 17.00
7

1. Horizontally partition (Row Groups)
20101107 106 01 1 6 30.00
20101107 103 04 2 1 17.00
20101107 109 04 2 2 20.00
20101107 103 03 2 1 17.00
20101107 106 05 3 4 20.00
20101108 106 02 1 5 25.00
8
20101108 102 02 1 1 14.00
20101108 106 03 2 5 25.00
20101108 109 01 1 1 10.00
20101109 106 04 2 4 20.00
20101109 106 04 2 5 25.00
20101109 103 01 1 1 17.00

2. Vertically partition via columns (segments)
9
OrderDateKey
20101107
20101107
20101107
20101107
20101107
20101108
ProductKey
106
103
109
103
106
106
StoreKey
01
04
04
03
05
02
RegionKey
1
2
2
2
3
1
Quantity
6
1
2
1
4
5
SalesAmount
30.00
17.00
20.00
17.00
20.00
25.00
OrderDateKey
20101108
20101108
20101108
20101109
20101109
20101109
ProductKey
102
106
109
106
106
103
StoreKey
02
03
01
04
04
01
RegionKey
1
2
1
2
2
1
Quantity
1
5
1
4
5
1
SalesAmount
14.00
25.00
10.00
20.00
25.00
17.00

3. Compress each segment*
10
OrderDateKey
20101107
20101108
ProductKey
106
103
109
StoreKey
01
04
03
05
02
RegionKey
1
2
Quantity
6
1
2
4
5
SalesAmount
30.00
17.00
20.00
25.00
Some segments will compress more than others
OrderDateKey
20101108
20101109
ProductKey
102
106
109
103
StoreKey
02
03
01
04
RegionKey
1
2
Quantity
1
5
4
SalesAmount
14.00
25.00
10.00
20.00
25.00
17.00
*Encoding and reordering not shown

4. Fetch only needed columns and row
groups
11
OrderDateKey
20101107
20101108
ProductKey
106
103
109
StoreKey
01
04
03
05
02
RegionKey
1
2
Quantity
6
1
2
4
5
SalesAmount
30.00
17.00
20.00
25.00
OrderDateKey
20101108
20101109
ProductKey
102
106
109
103
StoreKey
02
03
01
04
RegionKey
1
2
Quantity
1
5
4
SalesAmount
14.00
25.00
10.00
20.00
25.00
17.00
SELECT ProductKey, SUM (SalesAmount)
FROM SalesTable
WHERE OrderDateKey < 20101108
GROUP BY ProductKey

Practical case
• Scenario:
– Energy trading company migrates BI solution
to SQL server 2012
• Problems:
– ETL flow and intermediary calculations takes
too long time
– Loading fact tables with many indexes is slow
and indexes consumes much storage
– Processing of analysis services OLAP cube is
slow
– End user reporting on the relational data
mart has long response time in certain
scenarios
03-11-2014

Solution 1:
Optimize complex ETL calculations
Stage basic
trade data
13 min for 6 mio rows
0 min 2 min
03-11-2014
1 hour for 6 mio rows
Do derived
calculations
Load fact
table
Before optimization
5 min 50 min 5 min
Drop
columnstore
index
Stage basic
trade data
Create
columnstore
index
Do derived
calculations
Load fact
table
After optimization
5 min 1 min 5 min

Solution 2: Reduce fact load time
and save disk space
Drop non
clustered
indexes
03-11-2014
41/45 min for 20 mio rows, 8 GB index space
Load fact table
Create non
clustered
indexes
Before optimization
1 min 25 min
(45 min not dropping ix)
15 min
Drop
columnstore
index
Load fact table
Create
columnstore
index
After optimization
25 min 7 min
0 min
32 min for 20 mio rows, 1 GB index space
Some queries got
a bit slower!

Solution 3:
Slow processing of OLAP cube
SSAS MOLAP cube with partitions like fact table. 300 mio rows total.
Partition switching used for fact table load – average change of 30 mio rows per day.
Load switch
in table
0 min
55 min for 30 mio rows + better
performance for other queries
0 min 0 min
03-11-2014
1 hour for 30 mio rows
Switch
partition to
fact table
Process
OLAP cube
Before optimization
30 min 30 min
Drop
columnstore
index
Load switch
in table
Create
columnstore
index
Switch
partition to
fact table
Process
OLAP cube
After optimization
30 min 5 min 20 min

Solution 3:
Slow processing of OLAP cube
• Only little time saving on cube
processing…
• But what if storage mode was
changed from MOLAP to ROLAP or
HOLAP?
• Small experiment
– Some OLAP queries got slower
– Processing got a lot faster, especially
ROLAP due to no aggregations
– Saved OLAP storage space
03-11-2014

Solution 4:
Reduce reporting query time
Before optimization
After optimization
03-11-2014
210 seconds for doing star schema join and aggregation
Add columnstore
index to fact
table in ETL
10 seconds for doing same query
21 X FASTER !

Columnstore in SQL 2014
• New: Clustered Columnstore
– Dependency on conventional b-tree structures has
been removed
– Potential for significant disk space savings if workload
is satisfied without conventional indexes
• Note: Non-clustered columnstore is still
supported & is still a read-only structure
– Required if:
 Constraints are required
 Workload requires b-tree non-clustered indexes
18

• Fully Read/Write
– Less complicated ETL
– But partition switching & BULK INSERT remain best
practices
• Data type support expanded:
– All data types except: (n)varchar(max), varbinary(max),
XML, Spatial, CLR  (blob datatypes)
19

• “Batch mode” query plan improved
– New support for:
• All joins (including OUTER, HASH, SEMI (NOT IN, IN)
• UNION ALL
• Scalar aggregates
• “Mixed mode” plans
20

Columnstore in SQL 2014:
Insert & Updating Data
• Bulk insert
– Creates row groups of 1Million rows, last row group is probably
not full
– But if <100K rows, will be left in Row Store
• Insert/Update
– Collects rows in Row Store
• Tuple Mover
– When Row Store reaches 1Million rows, convert to a
Columnstore Row Group
– Runs every 5 minutes by default
– Started explicitly by ALTER INDEX <name> ON <table>
REORGANIZE
21

Comparison: Pros and cons
Index
type
03-11-2014
Pros Cons
Non-clustered
column
store
• Fastest for queries
• Allows other rowbased
indexes
• Not updateable
• Uses more storage
• More complex ETL design
Clustered
column
store
• Allows updating the table
• Easier ETL design
• Faster load
• Minimal storage usage
• No unique or key
constraints!
• No non-clustered indexes
• Requires periodic index
maintenance

Christian Winther Kristensen

Recommended

Recommended

More Related Content

Similar to Christian Winther Kristensen

Similar to Christian Winther Kristensen (20)

More from InfinIT - Innovationsnetværket for it

More from InfinIT - Innovationsnetværket for it (20)

Recently uploaded

Recently uploaded (20)

Christian Winther Kristensen