SlideShare a Scribd company logo
1 of 23
Download to read offline
APPLICATION OF SQL 
SERVER COLUMNSTORE 
INDEXES IN BI-SOLUTIONS 
Temadag: Modern Analytical Database Technology 
28. oktober 2014, Aalborg Universitet 
Christian Winther Kristensen 
Managing consultant 
cwk@rehfeld.dk
Agenda 
• SQL server columnstore index 
• Practical case 
• New updateable clustered 
columnstore in SQL server 2014 
• Comparison: Pros and cons 
• Questions 
03-11-2014
SQL server columnstore index 
• Came in SQL server 2012 
• Shares Microsoft xVelocity 
columnstore technology with Analysis 
Services Tabular model and 
PowerPivot 
• Highly compressed 
• Memory optimized 
• Not updateable 
 underlying table is read only! 
03-11-2014
Star schema 
4 
FactSales 
DimCustomer 
FactSales ( CustomerKey int 
, ProductKey int 
, EmployeeKey int 
, StoreKey int 
, OrderDateKey int 
, SalesAmount money 
) 
‐‐note: lots of ints in fact tables 
DimCustomer ( CustomerKey int 
, FirstName nvarchar(50) 
, LastName nvarchar(50) 
, Birthdate date 
, EmailAddress nvarchar(50) 
) 
DimProduct (… 
Best Practice: Integer keys! 
DimDate 
DimEmployee 
DimStore
How do columnstore indexes optimize 
performance? 
… 
Columnstore indexes store data column-wise 
 Each page stores data from a single column 
 Highly compressed 
 About 2x better than PAGE compression 
 More data fits in memory 
 Each column accessed independently 
 Fetch only needed columns 
 Can dramatically decrease I/O 
C1 C2 C3 C4 
Heaps, B-trees store data 
row-wise
Columnstore index architecture 
• Row Group 
– 1 million logically contiguous rows 
• Column Segment 
– Segment contains values from one 
column for a set of rows 
– Segments for the same set of rows 
comprise a row group 
– Segments are compressed 
– Each segment stored in a separate LOB 
– Segment is unit of transfer between 
disk and memory 
Segment 
C1 C2 C3 C4 C5 C6 
Row 
Group 
6
Columnstore index example 
OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 
20101107 106 01 1 6 30.00 
20101107 103 04 2 1 17.00 
20101107 109 04 2 2 20.00 
20101107 103 03 2 1 17.00 
20101107 106 05 3 4 20.00 
20101108 106 02 1 5 25.00 
20101108 102 02 1 1 14.00 
20101108 106 03 2 5 25.00 
20101108 109 01 1 1 10.00 
20101109 106 04 2 4 20.00 
20101109 106 04 2 5 25.00 
20101109 103 01 1 1 17.00 
7
1. Horizontally partition (Row Groups) 
OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 
20101107 106 01 1 6 30.00 
20101107 103 04 2 1 17.00 
20101107 109 04 2 2 20.00 
20101107 103 03 2 1 17.00 
20101107 106 05 3 4 20.00 
20101108 106 02 1 5 25.00 
8 
OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 
20101108 102 02 1 1 14.00 
20101108 106 03 2 5 25.00 
20101108 109 01 1 1 10.00 
20101109 106 04 2 4 20.00 
20101109 106 04 2 5 25.00 
20101109 103 01 1 1 17.00
2. Vertically partition via columns (segments) 
9 
OrderDateKey 
20101107 
20101107 
20101107 
20101107 
20101107 
20101108 
ProductKey 
106 
103 
109 
103 
106 
106 
StoreKey 
01 
04 
04 
03 
05 
02 
RegionKey 
1 
2 
2 
2 
3 
1 
Quantity 
6 
1 
2 
1 
4 
5 
SalesAmount 
30.00 
17.00 
20.00 
17.00 
20.00 
25.00 
OrderDateKey 
20101108 
20101108 
20101108 
20101109 
20101109 
20101109 
ProductKey 
102 
106 
109 
106 
106 
103 
StoreKey 
02 
03 
01 
04 
04 
01 
RegionKey 
1 
2 
1 
2 
2 
1 
Quantity 
1 
5 
1 
4 
5 
1 
SalesAmount 
14.00 
25.00 
10.00 
20.00 
25.00 
17.00
3. Compress each segment* 
10 
OrderDateKey 
20101107 
20101108 
ProductKey 
106 
103 
109 
StoreKey 
01 
04 
03 
05 
02 
RegionKey 
1 
2 
Quantity 
6 
1 
2 
4 
5 
SalesAmount 
30.00 
17.00 
20.00 
25.00 
Some segments will compress more than others 
OrderDateKey 
20101108 
20101109 
ProductKey 
102 
106 
109 
103 
StoreKey 
02 
03 
01 
04 
RegionKey 
1 
2 
Quantity 
1 
5 
4 
SalesAmount 
14.00 
25.00 
10.00 
20.00 
25.00 
17.00 
*Encoding and reordering not shown
4. Fetch only needed columns and row 
groups 
11 
OrderDateKey 
20101107 
20101108 
ProductKey 
106 
103 
109 
StoreKey 
01 
04 
03 
05 
02 
RegionKey 
1 
2 
Quantity 
6 
1 
2 
4 
5 
SalesAmount 
30.00 
17.00 
20.00 
25.00 
OrderDateKey 
20101108 
20101109 
ProductKey 
102 
106 
109 
103 
StoreKey 
02 
03 
01 
04 
RegionKey 
1 
2 
Quantity 
1 
5 
4 
SalesAmount 
14.00 
25.00 
10.00 
20.00 
25.00 
17.00 
SELECT ProductKey, SUM (SalesAmount) 
FROM SalesTable 
WHERE OrderDateKey < 20101108 
GROUP BY ProductKey
Practical case 
• Scenario: 
– Energy trading company migrates BI solution 
to SQL server 2012 
• Problems: 
– ETL flow and intermediary calculations takes 
too long time 
– Loading fact tables with many indexes is slow 
and indexes consumes much storage 
– Processing of analysis services OLAP cube is 
slow 
– End user reporting on the relational data 
mart has long response time in certain 
scenarios 
03-11-2014
Solution 1: 
Optimize complex ETL calculations 
Stage basic 
trade data 
13 min for 6 mio rows 
0 min 2 min 
03-11-2014 
1 hour for 6 mio rows 
Do derived 
calculations 
Load fact 
table 
Before optimization 
5 min 50 min 5 min 
Drop 
columnstore 
index 
Stage basic 
trade data 
Create 
columnstore 
index 
Do derived 
calculations 
Load fact 
table 
After optimization 
5 min 1 min 5 min
Solution 2: Reduce fact load time 
and save disk space 
Drop non 
clustered 
indexes 
03-11-2014 
41/45 min for 20 mio rows, 8 GB index space 
Load fact table 
Create non 
clustered 
indexes 
Before optimization 
1 min 25 min 
(45 min not dropping ix) 
15 min 
Drop 
columnstore 
index 
Load fact table 
Create 
columnstore 
index 
After optimization 
25 min 7 min 
0 min 
32 min for 20 mio rows, 1 GB index space 
Some queries got 
a bit slower!
Solution 3: 
Slow processing of OLAP cube 
SSAS MOLAP cube with partitions like fact table. 300 mio rows total. 
Partition switching used for fact table load – average change of 30 mio rows per day. 
Load switch 
in table 
0 min 
55 min for 30 mio rows + better 
performance for other queries 
0 min 0 min 
03-11-2014 
1 hour for 30 mio rows 
Switch 
partition to 
fact table 
Process 
OLAP cube 
Before optimization 
30 min 30 min 
Drop 
columnstore 
index 
Load switch 
in table 
Create 
columnstore 
index 
Switch 
partition to 
fact table 
Process 
OLAP cube 
After optimization 
30 min 5 min 20 min
Solution 3: 
Slow processing of OLAP cube 
• Only little time saving on cube 
processing… 
• But what if storage mode was 
changed from MOLAP to ROLAP or 
HOLAP? 
• Small experiment 
– Some OLAP queries got slower 
– Processing got a lot faster, especially 
ROLAP due to no aggregations 
– Saved OLAP storage space 
03-11-2014
Solution 4: 
Reduce reporting query time 
Before optimization 
After optimization 
03-11-2014 
210 seconds for doing star schema join and aggregation 
Add columnstore 
index to fact 
table in ETL 
10 seconds for doing same query 
21 X FASTER !
Columnstore in SQL 2014 
• New: Clustered Columnstore 
– Dependency on conventional b-tree structures has 
been removed 
– Potential for significant disk space savings if workload 
is satisfied without conventional indexes 
• Note: Non-clustered columnstore is still 
supported & is still a read-only structure 
– Required if: 
 Constraints are required 
 Workload requires b-tree non-clustered indexes 
18
Columnstore in SQL 2014 
• Fully Read/Write 
– Less complicated ETL 
– But partition switching & BULK INSERT remain best 
practices 
• Data type support expanded: 
– All data types except: (n)varchar(max), varbinary(max), 
XML, Spatial, CLR  (blob datatypes) 
19
Columnstore in SQL 2014 
• “Batch mode” query plan improved 
– New support for: 
• All joins (including OUTER, HASH, SEMI (NOT IN, IN) 
• UNION ALL 
• Scalar aggregates 
• “Mixed mode” plans 
20
Columnstore in SQL 2014: 
Insert & Updating Data 
• Bulk insert 
– Creates row groups of 1Million rows, last row group is probably 
not full 
– But if <100K rows, will be left in Row Store 
• Insert/Update 
– Collects rows in Row Store 
• Tuple Mover 
– When Row Store reaches 1Million rows, convert to a 
Columnstore Row Group 
– Runs every 5 minutes by default 
– Started explicitly by ALTER INDEX <name> ON <table> 
REORGANIZE 
21
Comparison: Pros and cons 
Index 
type 
03-11-2014 
Pros Cons 
Non-clustered 
column 
store 
• Fastest for queries 
• Allows other rowbased 
indexes 
• Not updateable 
• Uses more storage 
• More complex ETL design 
Clustered 
column 
store 
• Allows updating the table 
• Easier ETL design 
• Faster load 
• Minimal storage usage 
• No unique or key 
constraints! 
• No non-clustered indexes 
• Requires periodic index 
maintenance
Questions 
03-11-2014

More Related Content

Similar to Christian Winther Kristensen

Oracle Result Cache deep dive
Oracle Result Cache deep diveOracle Result Cache deep dive
Oracle Result Cache deep diveAlexander Tokarev
 
Performance Tuning Oracle's BI Applications
Performance Tuning Oracle's BI ApplicationsPerformance Tuning Oracle's BI Applications
Performance Tuning Oracle's BI ApplicationsKPI Partners
 
Oracle in-Memory Column Store for BI
Oracle in-Memory Column Store for BIOracle in-Memory Column Store for BI
Oracle in-Memory Column Store for BIFranck Pachot
 
Clustered Columnstore Introduction
Clustered Columnstore IntroductionClustered Columnstore Introduction
Clustered Columnstore IntroductionNiko Neugebauer
 
Oracle DB In-Memory technologie v kombinaci s procesorem M7
Oracle DB In-Memory technologie v kombinaci s procesorem M7Oracle DB In-Memory technologie v kombinaci s procesorem M7
Oracle DB In-Memory technologie v kombinaci s procesorem M7MarketingArrowECS_CZ
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
Faster transactions & analytics with the new SQL2016 In-memory technologies
Faster transactions & analytics with the new SQL2016 In-memory technologiesFaster transactions & analytics with the new SQL2016 In-memory technologies
Faster transactions & analytics with the new SQL2016 In-memory technologiesHenk van der Valk
 
Implementing Tables and Views.pptx
Implementing Tables and Views.pptxImplementing Tables and Views.pptx
Implementing Tables and Views.pptxLuisManuelUrbinaAmad
 
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]ITCamp
 
Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Niko Neugebauer
 
Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...
Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...
Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...Microsoft
 
SQL server 2016 New Features
SQL server 2016 New FeaturesSQL server 2016 New Features
SQL server 2016 New Featuresaminmesbahi
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaDataWorks Summit
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle DatabaseBest Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle DatabaseEdgar Alejandro Villegas
 
Sql server scalability fundamentals
Sql server scalability fundamentalsSql server scalability fundamentals
Sql server scalability fundamentalsChris Adkin
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDenny Lee
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...Edgar Alejandro Villegas
 
IBM Tivoli Storage Manager V6 - PCTY 2011
IBM Tivoli Storage Manager V6 - PCTY 2011IBM Tivoli Storage Manager V6 - PCTY 2011
IBM Tivoli Storage Manager V6 - PCTY 2011IBM Sverige
 

Similar to Christian Winther Kristensen (20)

Oracle Result Cache deep dive
Oracle Result Cache deep diveOracle Result Cache deep dive
Oracle Result Cache deep dive
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
Performance Tuning Oracle's BI Applications
Performance Tuning Oracle's BI ApplicationsPerformance Tuning Oracle's BI Applications
Performance Tuning Oracle's BI Applications
 
Oracle in-Memory Column Store for BI
Oracle in-Memory Column Store for BIOracle in-Memory Column Store for BI
Oracle in-Memory Column Store for BI
 
Clustered Columnstore Introduction
Clustered Columnstore IntroductionClustered Columnstore Introduction
Clustered Columnstore Introduction
 
Oracle DB In-Memory technologie v kombinaci s procesorem M7
Oracle DB In-Memory technologie v kombinaci s procesorem M7Oracle DB In-Memory technologie v kombinaci s procesorem M7
Oracle DB In-Memory technologie v kombinaci s procesorem M7
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Faster transactions & analytics with the new SQL2016 In-memory technologies
Faster transactions & analytics with the new SQL2016 In-memory technologiesFaster transactions & analytics with the new SQL2016 In-memory technologies
Faster transactions & analytics with the new SQL2016 In-memory technologies
 
Implementing Tables and Views.pptx
Implementing Tables and Views.pptxImplementing Tables and Views.pptx
Implementing Tables and Views.pptx
 
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]
 
Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016
 
Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...
Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...
Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...
 
SQL server 2016 New Features
SQL server 2016 New FeaturesSQL server 2016 New Features
SQL server 2016 New Features
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at Alibaba
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle DatabaseBest Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
 
Sql server scalability fundamentals
Sql server scalability fundamentalsSql server scalability fundamentals
Sql server scalability fundamentals
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons Learned
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
 
IBM Tivoli Storage Manager V6 - PCTY 2011
IBM Tivoli Storage Manager V6 - PCTY 2011IBM Tivoli Storage Manager V6 - PCTY 2011
IBM Tivoli Storage Manager V6 - PCTY 2011
 
SQL Tuning 101
SQL Tuning 101SQL Tuning 101
SQL Tuning 101
 

More from InfinIT - Innovationsnetværket for it

More from InfinIT - Innovationsnetværket for it (20)

Erfaringer med-c kurt-noermark
Erfaringer med-c kurt-noermarkErfaringer med-c kurt-noermark
Erfaringer med-c kurt-noermark
 
Object orientering, test driven development og c
Object orientering, test driven development og cObject orientering, test driven development og c
Object orientering, test driven development og c
 
Embedded softwaredevelopment hcs
Embedded softwaredevelopment hcsEmbedded softwaredevelopment hcs
Embedded softwaredevelopment hcs
 
C og c++-jens lund jensen
C og c++-jens lund jensenC og c++-jens lund jensen
C og c++-jens lund jensen
 
201811xx foredrag c_cpp
201811xx foredrag c_cpp201811xx foredrag c_cpp
201811xx foredrag c_cpp
 
C som-programmeringssprog-bt
C som-programmeringssprog-btC som-programmeringssprog-bt
C som-programmeringssprog-bt
 
Infinit seminar 060918
Infinit seminar 060918Infinit seminar 060918
Infinit seminar 060918
 
DCR solutions
DCR solutionsDCR solutions
DCR solutions
 
Not your grandfathers BPM
Not your grandfathers BPMNot your grandfathers BPM
Not your grandfathers BPM
 
Kmd workzone - an evolutionary approach to revolution
Kmd workzone - an evolutionary approach to revolutionKmd workzone - an evolutionary approach to revolution
Kmd workzone - an evolutionary approach to revolution
 
EcoKnow - oplæg
EcoKnow - oplægEcoKnow - oplæg
EcoKnow - oplæg
 
Martin Wickins Chatbots i fronten
Martin Wickins Chatbots i frontenMartin Wickins Chatbots i fronten
Martin Wickins Chatbots i fronten
 
Marie Fenger ai kundeservice
Marie Fenger ai kundeserviceMarie Fenger ai kundeservice
Marie Fenger ai kundeservice
 
Mads Kaysen SupWiz
Mads Kaysen SupWizMads Kaysen SupWiz
Mads Kaysen SupWiz
 
Leif Howalt NNIT Service Support Center
Leif Howalt NNIT Service Support CenterLeif Howalt NNIT Service Support Center
Leif Howalt NNIT Service Support Center
 
Jan Neerbek NLP og Chatbots
Jan Neerbek NLP og ChatbotsJan Neerbek NLP og Chatbots
Jan Neerbek NLP og Chatbots
 
Anders Soegaard NLP for Customer Support
Anders Soegaard NLP for Customer SupportAnders Soegaard NLP for Customer Support
Anders Soegaard NLP for Customer Support
 
Stephen Alstrup infinit august 2018
Stephen Alstrup infinit august 2018Stephen Alstrup infinit august 2018
Stephen Alstrup infinit august 2018
 
Innovation og værdiskabelse i it-projekter
Innovation og værdiskabelse i it-projekterInnovation og værdiskabelse i it-projekter
Innovation og værdiskabelse i it-projekter
 
Rokoko infin it presentation
Rokoko infin it presentation Rokoko infin it presentation
Rokoko infin it presentation
 

Recently uploaded

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Christian Winther Kristensen

  • 1. APPLICATION OF SQL SERVER COLUMNSTORE INDEXES IN BI-SOLUTIONS Temadag: Modern Analytical Database Technology 28. oktober 2014, Aalborg Universitet Christian Winther Kristensen Managing consultant cwk@rehfeld.dk
  • 2. Agenda • SQL server columnstore index • Practical case • New updateable clustered columnstore in SQL server 2014 • Comparison: Pros and cons • Questions 03-11-2014
  • 3. SQL server columnstore index • Came in SQL server 2012 • Shares Microsoft xVelocity columnstore technology with Analysis Services Tabular model and PowerPivot • Highly compressed • Memory optimized • Not updateable  underlying table is read only! 03-11-2014
  • 4. Star schema 4 FactSales DimCustomer FactSales ( CustomerKey int , ProductKey int , EmployeeKey int , StoreKey int , OrderDateKey int , SalesAmount money ) ‐‐note: lots of ints in fact tables DimCustomer ( CustomerKey int , FirstName nvarchar(50) , LastName nvarchar(50) , Birthdate date , EmailAddress nvarchar(50) ) DimProduct (… Best Practice: Integer keys! DimDate DimEmployee DimStore
  • 5. How do columnstore indexes optimize performance? … Columnstore indexes store data column-wise  Each page stores data from a single column  Highly compressed  About 2x better than PAGE compression  More data fits in memory  Each column accessed independently  Fetch only needed columns  Can dramatically decrease I/O C1 C2 C3 C4 Heaps, B-trees store data row-wise
  • 6. Columnstore index architecture • Row Group – 1 million logically contiguous rows • Column Segment – Segment contains values from one column for a set of rows – Segments for the same set of rows comprise a row group – Segments are compressed – Each segment stored in a separate LOB – Segment is unit of transfer between disk and memory Segment C1 C2 C3 C4 C5 C6 Row Group 6
  • 7. Columnstore index example OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 20101107 106 01 1 6 30.00 20101107 103 04 2 1 17.00 20101107 109 04 2 2 20.00 20101107 103 03 2 1 17.00 20101107 106 05 3 4 20.00 20101108 106 02 1 5 25.00 20101108 102 02 1 1 14.00 20101108 106 03 2 5 25.00 20101108 109 01 1 1 10.00 20101109 106 04 2 4 20.00 20101109 106 04 2 5 25.00 20101109 103 01 1 1 17.00 7
  • 8. 1. Horizontally partition (Row Groups) OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 20101107 106 01 1 6 30.00 20101107 103 04 2 1 17.00 20101107 109 04 2 2 20.00 20101107 103 03 2 1 17.00 20101107 106 05 3 4 20.00 20101108 106 02 1 5 25.00 8 OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 20101108 102 02 1 1 14.00 20101108 106 03 2 5 25.00 20101108 109 01 1 1 10.00 20101109 106 04 2 4 20.00 20101109 106 04 2 5 25.00 20101109 103 01 1 1 17.00
  • 9. 2. Vertically partition via columns (segments) 9 OrderDateKey 20101107 20101107 20101107 20101107 20101107 20101108 ProductKey 106 103 109 103 106 106 StoreKey 01 04 04 03 05 02 RegionKey 1 2 2 2 3 1 Quantity 6 1 2 1 4 5 SalesAmount 30.00 17.00 20.00 17.00 20.00 25.00 OrderDateKey 20101108 20101108 20101108 20101109 20101109 20101109 ProductKey 102 106 109 106 106 103 StoreKey 02 03 01 04 04 01 RegionKey 1 2 1 2 2 1 Quantity 1 5 1 4 5 1 SalesAmount 14.00 25.00 10.00 20.00 25.00 17.00
  • 10. 3. Compress each segment* 10 OrderDateKey 20101107 20101108 ProductKey 106 103 109 StoreKey 01 04 03 05 02 RegionKey 1 2 Quantity 6 1 2 4 5 SalesAmount 30.00 17.00 20.00 25.00 Some segments will compress more than others OrderDateKey 20101108 20101109 ProductKey 102 106 109 103 StoreKey 02 03 01 04 RegionKey 1 2 Quantity 1 5 4 SalesAmount 14.00 25.00 10.00 20.00 25.00 17.00 *Encoding and reordering not shown
  • 11. 4. Fetch only needed columns and row groups 11 OrderDateKey 20101107 20101108 ProductKey 106 103 109 StoreKey 01 04 03 05 02 RegionKey 1 2 Quantity 6 1 2 4 5 SalesAmount 30.00 17.00 20.00 25.00 OrderDateKey 20101108 20101109 ProductKey 102 106 109 103 StoreKey 02 03 01 04 RegionKey 1 2 Quantity 1 5 4 SalesAmount 14.00 25.00 10.00 20.00 25.00 17.00 SELECT ProductKey, SUM (SalesAmount) FROM SalesTable WHERE OrderDateKey < 20101108 GROUP BY ProductKey
  • 12. Practical case • Scenario: – Energy trading company migrates BI solution to SQL server 2012 • Problems: – ETL flow and intermediary calculations takes too long time – Loading fact tables with many indexes is slow and indexes consumes much storage – Processing of analysis services OLAP cube is slow – End user reporting on the relational data mart has long response time in certain scenarios 03-11-2014
  • 13. Solution 1: Optimize complex ETL calculations Stage basic trade data 13 min for 6 mio rows 0 min 2 min 03-11-2014 1 hour for 6 mio rows Do derived calculations Load fact table Before optimization 5 min 50 min 5 min Drop columnstore index Stage basic trade data Create columnstore index Do derived calculations Load fact table After optimization 5 min 1 min 5 min
  • 14. Solution 2: Reduce fact load time and save disk space Drop non clustered indexes 03-11-2014 41/45 min for 20 mio rows, 8 GB index space Load fact table Create non clustered indexes Before optimization 1 min 25 min (45 min not dropping ix) 15 min Drop columnstore index Load fact table Create columnstore index After optimization 25 min 7 min 0 min 32 min for 20 mio rows, 1 GB index space Some queries got a bit slower!
  • 15. Solution 3: Slow processing of OLAP cube SSAS MOLAP cube with partitions like fact table. 300 mio rows total. Partition switching used for fact table load – average change of 30 mio rows per day. Load switch in table 0 min 55 min for 30 mio rows + better performance for other queries 0 min 0 min 03-11-2014 1 hour for 30 mio rows Switch partition to fact table Process OLAP cube Before optimization 30 min 30 min Drop columnstore index Load switch in table Create columnstore index Switch partition to fact table Process OLAP cube After optimization 30 min 5 min 20 min
  • 16. Solution 3: Slow processing of OLAP cube • Only little time saving on cube processing… • But what if storage mode was changed from MOLAP to ROLAP or HOLAP? • Small experiment – Some OLAP queries got slower – Processing got a lot faster, especially ROLAP due to no aggregations – Saved OLAP storage space 03-11-2014
  • 17. Solution 4: Reduce reporting query time Before optimization After optimization 03-11-2014 210 seconds for doing star schema join and aggregation Add columnstore index to fact table in ETL 10 seconds for doing same query 21 X FASTER !
  • 18. Columnstore in SQL 2014 • New: Clustered Columnstore – Dependency on conventional b-tree structures has been removed – Potential for significant disk space savings if workload is satisfied without conventional indexes • Note: Non-clustered columnstore is still supported & is still a read-only structure – Required if:  Constraints are required  Workload requires b-tree non-clustered indexes 18
  • 19. Columnstore in SQL 2014 • Fully Read/Write – Less complicated ETL – But partition switching & BULK INSERT remain best practices • Data type support expanded: – All data types except: (n)varchar(max), varbinary(max), XML, Spatial, CLR  (blob datatypes) 19
  • 20. Columnstore in SQL 2014 • “Batch mode” query plan improved – New support for: • All joins (including OUTER, HASH, SEMI (NOT IN, IN) • UNION ALL • Scalar aggregates • “Mixed mode” plans 20
  • 21. Columnstore in SQL 2014: Insert & Updating Data • Bulk insert – Creates row groups of 1Million rows, last row group is probably not full – But if <100K rows, will be left in Row Store • Insert/Update – Collects rows in Row Store • Tuple Mover – When Row Store reaches 1Million rows, convert to a Columnstore Row Group – Runs every 5 minutes by default – Started explicitly by ALTER INDEX <name> ON <table> REORGANIZE 21
  • 22. Comparison: Pros and cons Index type 03-11-2014 Pros Cons Non-clustered column store • Fastest for queries • Allows other rowbased indexes • Not updateable • Uses more storage • More complex ETL design Clustered column store • Allows updating the table • Easier ETL design • Faster load • Minimal storage usage • No unique or key constraints! • No non-clustered indexes • Requires periodic index maintenance