This document discusses the application of SQL Server columnstore indexes in business intelligence (BI) solutions. It begins with an overview of columnstore indexes in SQL Server 2012, explaining how they store and compress data column-wise for high performance. Practical cases are then presented where a energy trading company saw improvements by applying columnstore indexes to optimize ETL processing times, reduce fact table loading times, improve OLAP cube processing speeds, and reduce reporting query times. The document concludes with an overview of updated columnstore indexes in SQL Server 2014, including new clustered columnstore indexes that allow for updating data.
1. APPLICATION OF SQL
SERVER COLUMNSTORE
INDEXES IN BI-SOLUTIONS
Temadag: Modern Analytical Database Technology
28. oktober 2014, Aalborg Universitet
Christian Winther Kristensen
Managing consultant
cwk@rehfeld.dk
2. Agenda
• SQL server columnstore index
• Practical case
• New updateable clustered
columnstore in SQL server 2014
• Comparison: Pros and cons
• Questions
03-11-2014
3. SQL server columnstore index
• Came in SQL server 2012
• Shares Microsoft xVelocity
columnstore technology with Analysis
Services Tabular model and
PowerPivot
• Highly compressed
• Memory optimized
• Not updateable
underlying table is read only!
03-11-2014
4. Star schema
4
FactSales
DimCustomer
FactSales ( CustomerKey int
, ProductKey int
, EmployeeKey int
, StoreKey int
, OrderDateKey int
, SalesAmount money
)
‐‐note: lots of ints in fact tables
DimCustomer ( CustomerKey int
, FirstName nvarchar(50)
, LastName nvarchar(50)
, Birthdate date
, EmailAddress nvarchar(50)
)
DimProduct (…
Best Practice: Integer keys!
DimDate
DimEmployee
DimStore
5. How do columnstore indexes optimize
performance?
…
Columnstore indexes store data column-wise
Each page stores data from a single column
Highly compressed
About 2x better than PAGE compression
More data fits in memory
Each column accessed independently
Fetch only needed columns
Can dramatically decrease I/O
C1 C2 C3 C4
Heaps, B-trees store data
row-wise
6. Columnstore index architecture
• Row Group
– 1 million logically contiguous rows
• Column Segment
– Segment contains values from one
column for a set of rows
– Segments for the same set of rows
comprise a row group
– Segments are compressed
– Each segment stored in a separate LOB
– Segment is unit of transfer between
disk and memory
Segment
C1 C2 C3 C4 C5 C6
Row
Group
6
11. 4. Fetch only needed columns and row
groups
11
OrderDateKey
20101107
20101108
ProductKey
106
103
109
StoreKey
01
04
03
05
02
RegionKey
1
2
Quantity
6
1
2
4
5
SalesAmount
30.00
17.00
20.00
25.00
OrderDateKey
20101108
20101109
ProductKey
102
106
109
103
StoreKey
02
03
01
04
RegionKey
1
2
Quantity
1
5
4
SalesAmount
14.00
25.00
10.00
20.00
25.00
17.00
SELECT ProductKey, SUM (SalesAmount)
FROM SalesTable
WHERE OrderDateKey < 20101108
GROUP BY ProductKey
12. Practical case
• Scenario:
– Energy trading company migrates BI solution
to SQL server 2012
• Problems:
– ETL flow and intermediary calculations takes
too long time
– Loading fact tables with many indexes is slow
and indexes consumes much storage
– Processing of analysis services OLAP cube is
slow
– End user reporting on the relational data
mart has long response time in certain
scenarios
03-11-2014
13. Solution 1:
Optimize complex ETL calculations
Stage basic
trade data
13 min for 6 mio rows
0 min 2 min
03-11-2014
1 hour for 6 mio rows
Do derived
calculations
Load fact
table
Before optimization
5 min 50 min 5 min
Drop
columnstore
index
Stage basic
trade data
Create
columnstore
index
Do derived
calculations
Load fact
table
After optimization
5 min 1 min 5 min
14. Solution 2: Reduce fact load time
and save disk space
Drop non
clustered
indexes
03-11-2014
41/45 min for 20 mio rows, 8 GB index space
Load fact table
Create non
clustered
indexes
Before optimization
1 min 25 min
(45 min not dropping ix)
15 min
Drop
columnstore
index
Load fact table
Create
columnstore
index
After optimization
25 min 7 min
0 min
32 min for 20 mio rows, 1 GB index space
Some queries got
a bit slower!
15. Solution 3:
Slow processing of OLAP cube
SSAS MOLAP cube with partitions like fact table. 300 mio rows total.
Partition switching used for fact table load – average change of 30 mio rows per day.
Load switch
in table
0 min
55 min for 30 mio rows + better
performance for other queries
0 min 0 min
03-11-2014
1 hour for 30 mio rows
Switch
partition to
fact table
Process
OLAP cube
Before optimization
30 min 30 min
Drop
columnstore
index
Load switch
in table
Create
columnstore
index
Switch
partition to
fact table
Process
OLAP cube
After optimization
30 min 5 min 20 min
16. Solution 3:
Slow processing of OLAP cube
• Only little time saving on cube
processing…
• But what if storage mode was
changed from MOLAP to ROLAP or
HOLAP?
• Small experiment
– Some OLAP queries got slower
– Processing got a lot faster, especially
ROLAP due to no aggregations
– Saved OLAP storage space
03-11-2014
17. Solution 4:
Reduce reporting query time
Before optimization
After optimization
03-11-2014
210 seconds for doing star schema join and aggregation
Add columnstore
index to fact
table in ETL
10 seconds for doing same query
21 X FASTER !
18. Columnstore in SQL 2014
• New: Clustered Columnstore
– Dependency on conventional b-tree structures has
been removed
– Potential for significant disk space savings if workload
is satisfied without conventional indexes
• Note: Non-clustered columnstore is still
supported & is still a read-only structure
– Required if:
Constraints are required
Workload requires b-tree non-clustered indexes
18
19. Columnstore in SQL 2014
• Fully Read/Write
– Less complicated ETL
– But partition switching & BULK INSERT remain best
practices
• Data type support expanded:
– All data types except: (n)varchar(max), varbinary(max),
XML, Spatial, CLR (blob datatypes)
19
20. Columnstore in SQL 2014
• “Batch mode” query plan improved
– New support for:
• All joins (including OUTER, HASH, SEMI (NOT IN, IN)
• UNION ALL
• Scalar aggregates
• “Mixed mode” plans
20
21. Columnstore in SQL 2014:
Insert & Updating Data
• Bulk insert
– Creates row groups of 1Million rows, last row group is probably
not full
– But if <100K rows, will be left in Row Store
• Insert/Update
– Collects rows in Row Store
• Tuple Mover
– When Row Store reaches 1Million rows, convert to a
Columnstore Row Group
– Runs every 5 minutes by default
– Started explicitly by ALTER INDEX <name> ON <table>
REORGANIZE
21
22. Comparison: Pros and cons
Index
type
03-11-2014
Pros Cons
Non-clustered
column
store
• Fastest for queries
• Allows other rowbased
indexes
• Not updateable
• Uses more storage
• More complex ETL design
Clustered
column
store
• Allows updating the table
• Easier ETL design
• Faster load
• Minimal storage usage
• No unique or key
constraints!
• No non-clustered indexes
• Requires periodic index
maintenance