SlideShare a Scribd company logo
1 of 63
Download to read offline
Clustered Columnstore –
Deep Dive
Niko Neugebauer
Barcelona, October 25th 2014
Have you seen this guy ?
Our Main Sponsors:
Niko Neugebauer
Microsoft Data Platform Professional
OH22 (http://www.oh22.net)
15+ years in IT
SQL Server MVP
Founder of 3 Portuguese PASS Chapters
Blog: http://www.nikoport.com
Twitter: @NikoNeugebauer
LinkedIn: http://pt.linkedin.com/in/webcaravela
Email: info@webcaravela.com
So this is a supposedly Deep Dive 
 My assumptions:
 You have heard about Columnstore Indexes
 You understand the difference between RowStore vs
Columnstore
 You know about Dictionary existance in Columnstore Indexes
 You know how locking & blocking works (at least understand the
S, SI, IX, X locks)
 You have used DBCC Page functionality 
 You are crazy enough to believe that this topic could be
expanded into this kind of level 
Todays Plan
• Intro (About Columnstore, Row Groups,
Segments, Delta-Stores)
• Batch Mode
• Compresson phases
• Dictionaries
• Materialisation
• Meta-informations
• DBCC (Nope )
• Locking & Blocking (Hopefully )
• Bulk Load (Nope )
• Tuple Mover (Nope )
About Columnstore Indexes:
 Reading Fact tables
 Reading Big Dimension tables
 Very low-activity big OLTP tables, which are
scanned & processed almost entirely
 Data Warehouses
 Decision Support Applications
 Business Intelligence Applications
Clustered Columnstore in
SQL Server 2014
 Delta-Stores (open & close)
 Deleted Bitmap
 Delete & Update work as a
DELETE + INSERT
BATCH MODE
Batch Mode
 New Model of Data Processing
 Query execution by using GetNext (), which delivers data to the CPU
(In its turn it will go down the stack and get physical accecss to data.
Every operator behaves this way, which makes GetNext() a virtuall
function.
For execution plans sometimes you will have 100s of this function
invocation before you will get actual 1 row.
For OLTP it might be a good idea, since we are working just with few
rows, but If you are working with millions of rows (in BI or DW) you will
make billions of such invocations.
 Entering Batch Mode, which actually invokes data for processing not 1
by 1 but in Batches of ~900 rows
 This might bring benefits in 10s & 100s times
Batch Mode
 In Batch Mode every operator down the stack have to play
the same game, passing the same amount of data -> Row
Mode can’t interact with Batch Mode.
 64 row vs 900 rows
(Progammers, its like passing an Array vs 1 by 1 param )
 Works exclusively for Columnstore Indexes
 Works exclusively for parallel plans, hence MAXDOP >= 2
 Think about it as if it would be a Factory processing vs
Manual Processing (19th vs 18th Century)
Batch Mode is fragile
 Not every operator is implemented in Batch Mode.
 Examples of Row Mode operators: Sort, Exchange, Inner LOOP,
Merge, ...
 Any disturbance in the force will make Batch Execution Mode to fall
down into Row Execution Mode, for example lack of memory.
 SQL Server 2014 introduces so-called “Mixed Mode”, where execution
plan operators in Row Mode can co-exist with Batch Mode operators
Batch Mode Deep Dive
 Optimized for 64 bit values of the register
 Late materialization (working on compressed values)
 Batch Size is optimized to work in L2 Cache with idea of
avoiding Cache Misses
Latency Cache
 L1 cache reference – 0.5 ns
 L2 cache reference – 7.0 ns (14 times slower)
 L3 cache reference – 28.0 ns (4 times slower)
 L3 cache reference (outside NUMA) – 42.0 ns (6 times slower)
 Main memory reference – 100ns (3 times slower)
 Read 1 MB sequentially from memory – 250.000 ns (5.000
times L1 Cache)
Batch Mode
SQL Server 2014 Batch Mode
• All execution improvements are done for
Nonclustered & Clustered Columnstores
• Mixed Mode – Row & Batch mode can co-exist
• OUTER JOIN, UNION ALL, EXIST, IN, Scalar
Aggregates, Distinct Aggregates – all work in Batch
Mode
• Some TempDB operations for Columnstore Indexes
are running in Batch mode. (TempDB Spill)
BATCH MODE
Demo, Demo, Demo
Basics phases of the
Columnstore Indexes creation:
1. Row Groups separation
2. Segment creation
3. Compression
1. Row Groups creation
~ 1 Million Rows
~ 1 Million Rows
~ 1 Million Rows
~ 1 Million Rows
}
}
}
}
2. Segments separation
Column
Segment
3. Compression (involves reordering,
compression & LOB conversion)
3 C
2 B
4 D
1 A
1 A ... ...
2 B ... ...
3 C ... ...
4 D ... ...
... ...
... ...
... .
.
.
.
..
Columnstore compression steps
(when applicable)
• Value Scale
• Bit-Array
• Run-length compression
• Dictionary encoding
• Huffman encoding
• Binary compression
Value Scale
Amount
1023
1002
1007
1128
1096
1055
1200
1056
Amount
23
2
7
128
96
55
200
56
Base/Scale: 1000
Bit Array
Name
Mark
Andre
John
Mark
John
Andre
John
Mark
Mark Andre John
1 0 0
0 1 0
0 0 1
1 0 0
0 0 1
0 1 0
0 0 1
1 0 0
Run-Length Encoding (compress)
Name
Mark
Mark
John
Andre
Andre
Andre
Ricardo
Mark
Charlie
Mark
Charlie
Name
Mark:2
John:1
Andre:3
Ricardo:1
Mark-Charlie:2
Run-length compression, more complex
scenario
Name Last Name
Mark Simpson
Mark Donalds
John Simpson
Andre White
Andre Donalds
Andre Simpson
Ricardo Simpson
Mark Simpson
Charlie Simpson
Mark White
Charlie Donalds
Name Last Name
Mark:4 Simpson:1
John:1 Donalds:1
Andre:3 Simpson:1
Ricardo:1 White:1
Charlie:2 Simpson:1
White:1
Donalds:1
Simpson:3
Donalds:1
Name Last Name
Mark Simpson
Mark Donalds
Mark Simpson
Mark White
John Simpson
Andre White
Andre Donalds
Andre Simpson
Ricardo Simpson
Charlie Simpson
Charlie Donalds
Run-length compression, more complex
scenario, part 2
Name Last Name
Mark Simpson
Mark Donalds
John Simpson
Andre White
Andre Donalds
Andre Simpson
Ricardo Simpson
Mark Simpson
Charlie Simpson
Mark White
Charlie Donalds
Name Last Name
Andre:1 Donalds:3
Charlie:1 Simpson:6
Mark:3 White:3
Andre:1
Ricarod:1
John:1
Charlie:1
Andre:1
Mark:1
Name Last Name
Andre Donalds
Charlie Donalds
Mark Donalds
Mark Simpson
Mark Simpson
Andre Simpson
Ricardo Simpson
John Simpson
Charlie Simpson
Andre White
Mark White
Dictionary enconding
Name
Mark
Andre
John
Mark
John
Andre
John
Mark
Name
Mark 1
Andre 2
John 3
Name ID
1
2
3
1
3
2
3
1
Huffman enconding (aka ASCII encoding)
Name Count Code
Mark 4 001
Andre 3 010
Charlie 2 011
John 1 100
Ricardo 1 101
 Fairly efficient ~ N log (N)
 Design a Huffman code in linear time if input probabilities
(aka weights) are sorted.
Name Last Name
Mark Simpson
Mark Donalds
Mark Simpson
Mark White
John Simpson
Andre White
Andre Donalds
Andre Simpson
Ricardo Simpson
Charlie Simpson
Charlie Donalds
Huffman enconding tree (sample)
Binary Compression
 Super-secret Vertipac aka xVelocity compression
turning data into LOBs. 
 LOBs are stored by using traditional storage mechanisms
(8K pages & extents)
COLUMNSTORE ARCHIVE
Columnstore Archival Compression
 One more compression level
 Applied over the xVelocity compression
 It is a slight modification of LZ77 (aka Zip)
New!
Compression Recap:
Determination of the best algorithm is the principal key for the
success for the X-Velocity. This process includes data shuffling
between segments and different methods of compression.
Every segment has different data, and so different algorithms with
different success are being applied.
If you are seeing a lot of queries including a predicate on a certain
column, then try creating a traditional clustered index on it
(sorting) and then create a columnstore.
Every compression is supported on the partition level
Compression Example:
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
Noncompressed Row Page Columnstore Archival
DICTIONARIES
For Columnstore
Dictionaries types
1
2
3
4
Global
Dictionary
Local
Dictionary
Local
Dictionary
Dictionaries
 Global dictionaries, contain entries for each and every of the
existing segments of the same column storage
 Local dictionaries, contain entries for 1 or more segments of the
same column storage
 Sizes varies from 56 bytes (min) to 16 MB (max)
 There is a specialized view which provides information on the
dictionaries, such as entries count, size, etc -
sys.column_store_dictionaries
 Undocumented feature which potentially allow us to consult the
content of the dictionaries (will see it later)
 No all columns will use dictionaries
MATERIALISATION
Let’s execute a query:
select name, count(*)
from dbo.SampleTable
group by name
order by count(*) desc;
Execution Plan:
Materialisation Process
Name
Andre
Miguel
Sofia
Joana
Andre
Sofia
Vitor
Paulo
Joana
Miguel
Paulo
Paulo
Name Value
Andre 1
Joana 2
Miguel 3
Paulo 4
Sofia 5
Vitor 6
Compressed
1
3
5
2
1
5
6
4
2
3
4
4
Select, Count, Group, Sort
Compressed
1
3
5
2
1
5
6
4
2
3
4
4
Item Count
1 2
3 2
5 2
2 2
6 1
4 3
Item Count
4 3
3 2
5 2
2 2
1 2
6 1
Name Count
Andre 3
Joana 2
Miguel 2
Paulo 2
Sofia 2
Vitor 1
Meta-information
• sys.column_store_dictionaries – SQL Server
2012
• sys.column_store_segments – SQL Server
2012
• sys.column_store_row_groups – SQL Server
2014
DBCC CSINDEX
Never try this on anything else besides your own test PC
DBCC CSINDEX
DBCC CSIndex (
{'dbname' | dbid},
rowsetid, --HoBT or PartitionID
columnid, -- Column_id from sys.column_store_segments
rowgroupid, -- segment_id from sys.column_store_segments
object_type, -- 1 (Segment), 2 (Dictionary),
print_option -- [0 or 1 or 2]
[, start]
[, end]
)
LOCKING
Clustered Columnstore Indexes
Columnstore elements:
 Row
 Column
 Row Group
 Segment
 Delta-Store
 Deleted Bitmap
But lock is placed
on Row Group/Delta-Store level
BULK LOAD
Columnstore
BULK Load
 A process completely apart
 102.400 is a magic number which gives you a Segment
instead of a Delta-Store
 For data load, if you order your loaded data into chunks
of 1.045.678 rows for loading – your Columnstore will
be almost perfect 
MEMORY MANAGEMENT
Columnstore Indexes
Memory Management
 Columnstore Indexes consume A LOT of memory
 Columnstore Object Pool – new special pool in SQL 2012+
 New Memory Brocker which divides memory between Row Store & Column
Store
Memory Management
 Memory Grant for Index Creation in MB = ( 4.2 * Cols_Count + 68 ) * DOP +
String_Cols * 34 (2012 Formula)
 When not enough memory granted, you might need to change Resource
Governor limits for the respective group (here setting max percent grant to
50%):
ALTER WORKLOAD GROUP [DEFAULT] WITH
(REQUEST_MAX_MEMORY_GRANT_PERCENT=50);
GO
ALTER RESOURCE GOVERNOR
RECONFIGURE
GO
 Memory Management is automatic, so when you have not enough memory
– then the DOP will be lowered automatically until 2, so the memory
consumption will be lowered.
Data Loading
Data Loading
Data Loading
Update vs Delete + Insert
Backup
Backup size
Restore
Muchas
Gracias
Our Main Sponsors:
Links:
My blog series on Columnstore Indexes (39+ Blogposts):
 http://www.nikoport.com/columnstore/
Remus Rusanu Introduction for Clustered Columnstore:
 http://rusanu.com/2013/06/11/sql-server-clustered-columnstore-
indexes-at-teched-2013/
White Paper on the Clustered Columnstore:
 http://research.microsoft.com/pubs/193599/Apollo3%20-
%20Sigmod%202013%20-%20final.pdf

More Related Content

What's hot

MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStoreMariaDB plc
 
ETL with Clustered Columnstore - PASS Summit 2014
ETL with Clustered Columnstore - PASS Summit 2014ETL with Clustered Columnstore - PASS Summit 2014
ETL with Clustered Columnstore - PASS Summit 2014Niko Neugebauer
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)Karthik .P.R
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresqlbotsplash.com
 
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScyllaDB
 
M|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocksM|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocksMariaDB plc
 
Scaling MySQL using Fabric
Scaling MySQL using FabricScaling MySQL using Fabric
Scaling MySQL using FabricKarthik .P.R
 
Writing powerful stored procedures in PL/SQL
Writing powerful stored procedures in PL/SQLWriting powerful stored procedures in PL/SQL
Writing powerful stored procedures in PL/SQLMariaDB plc
 
How MariaDB is approaching DBaaS
How MariaDB is approaching DBaaSHow MariaDB is approaching DBaaS
How MariaDB is approaching DBaaSMariaDB plc
 
Running Scylla on Kubernetes with Scylla Operator
Running Scylla on Kubernetes with Scylla OperatorRunning Scylla on Kubernetes with Scylla Operator
Running Scylla on Kubernetes with Scylla OperatorScyllaDB
 
Migrating to postgresql
Migrating to postgresqlMigrating to postgresql
Migrating to postgresqlbotsplash.com
 
Jss 2015 in memory and operational analytics
Jss 2015   in memory and operational analyticsJss 2015   in memory and operational analytics
Jss 2015 in memory and operational analyticsDavid Barbarin
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMariaDB plc
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dcBob Ward
 
ClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale outClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale outMariaDB plc
 
MySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup MumbaiMySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup MumbaiRemote MySQL DBA
 
Introduction of MariaDB AX / TX
Introduction of MariaDB AX / TXIntroduction of MariaDB AX / TX
Introduction of MariaDB AX / TXGOTO Satoru
 
Introduction of MariaDB 2017 09
Introduction of MariaDB 2017 09Introduction of MariaDB 2017 09
Introduction of MariaDB 2017 09GOTO Satoru
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 

What's hot (20)

MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
 
ETL with Clustered Columnstore - PASS Summit 2014
ETL with Clustered Columnstore - PASS Summit 2014ETL with Clustered Columnstore - PASS Summit 2014
ETL with Clustered Columnstore - PASS Summit 2014
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
 
M|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocksM|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocks
 
Scaling MySQL using Fabric
Scaling MySQL using FabricScaling MySQL using Fabric
Scaling MySQL using Fabric
 
Writing powerful stored procedures in PL/SQL
Writing powerful stored procedures in PL/SQLWriting powerful stored procedures in PL/SQL
Writing powerful stored procedures in PL/SQL
 
How MariaDB is approaching DBaaS
How MariaDB is approaching DBaaSHow MariaDB is approaching DBaaS
How MariaDB is approaching DBaaS
 
Running Scylla on Kubernetes with Scylla Operator
Running Scylla on Kubernetes with Scylla OperatorRunning Scylla on Kubernetes with Scylla Operator
Running Scylla on Kubernetes with Scylla Operator
 
Migrating to postgresql
Migrating to postgresqlMigrating to postgresql
Migrating to postgresql
 
Jss 2015 in memory and operational analytics
Jss 2015   in memory and operational analyticsJss 2015   in memory and operational analytics
Jss 2015 in memory and operational analytics
 
NoSQL
NoSQLNoSQL
NoSQL
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dc
 
ClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale outClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale out
 
MySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup MumbaiMySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup Mumbai
 
Introduction of MariaDB AX / TX
Introduction of MariaDB AX / TXIntroduction of MariaDB AX / TX
Introduction of MariaDB AX / TX
 
Introduction of MariaDB 2017 09
Introduction of MariaDB 2017 09Introduction of MariaDB 2017 09
Introduction of MariaDB 2017 09
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 

Viewers also liked

Sql server 2014 x velocity – updateable columnstore indexes
Sql server 2014 x velocity – updateable columnstore indexesSql server 2014 x velocity – updateable columnstore indexes
Sql server 2014 x velocity – updateable columnstore indexesPat Sheehan
 
Sql server data store data access internals
Sql server data store data access internalsSql server data store data access internals
Sql server data store data access internalsMasayuki Ozawa
 
SQL Server 2014 データベースエンジン新機能
SQL Server 2014 データベースエンジン新機能SQL Server 2014 データベースエンジン新機能
SQL Server 2014 データベースエンジン新機能Masayuki Ozawa
 
Sql server 2012 の新機能を使ってみよう。db 管理者向け機能の紹介
Sql server 2012 の新機能を使ってみよう。db 管理者向け機能の紹介Sql server 2012 の新機能を使ってみよう。db 管理者向け機能の紹介
Sql server 2012 の新機能を使ってみよう。db 管理者向け機能の紹介Masayuki Ozawa
 
Columnstore indexes in sql server 2014
Columnstore indexes in sql server 2014Columnstore indexes in sql server 2014
Columnstore indexes in sql server 2014Antonios Chatzipavlis
 
[D35] インメモリーデータベース徹底比較 by Komori
[D35] インメモリーデータベース徹底比較 by Komori[D35] インメモリーデータベース徹底比較 by Komori
[D35] インメモリーデータベース徹底比較 by KomoriInsight Technology, Inc.
 
Sql server 2016 ctp 3.0 新機能
Sql server 2016 ctp 3.0 新機能Sql server 2016 ctp 3.0 新機能
Sql server 2016 ctp 3.0 新機能Masayuki Ozawa
 
SQL server 2016 New Features
SQL server 2016 New FeaturesSQL server 2016 New Features
SQL server 2016 New Featuresaminmesbahi
 

Viewers also liked (9)

Sql server 2014 x velocity – updateable columnstore indexes
Sql server 2014 x velocity – updateable columnstore indexesSql server 2014 x velocity – updateable columnstore indexes
Sql server 2014 x velocity – updateable columnstore indexes
 
Sql server data store data access internals
Sql server data store data access internalsSql server data store data access internals
Sql server data store data access internals
 
SQL Server 2014 データベースエンジン新機能
SQL Server 2014 データベースエンジン新機能SQL Server 2014 データベースエンジン新機能
SQL Server 2014 データベースエンジン新機能
 
DBTS2015_B35_SQLServer2016
DBTS2015_B35_SQLServer2016DBTS2015_B35_SQLServer2016
DBTS2015_B35_SQLServer2016
 
Sql server 2012 の新機能を使ってみよう。db 管理者向け機能の紹介
Sql server 2012 の新機能を使ってみよう。db 管理者向け機能の紹介Sql server 2012 の新機能を使ってみよう。db 管理者向け機能の紹介
Sql server 2012 の新機能を使ってみよう。db 管理者向け機能の紹介
 
Columnstore indexes in sql server 2014
Columnstore indexes in sql server 2014Columnstore indexes in sql server 2014
Columnstore indexes in sql server 2014
 
[D35] インメモリーデータベース徹底比較 by Komori
[D35] インメモリーデータベース徹底比較 by Komori[D35] インメモリーデータベース徹底比較 by Komori
[D35] インメモリーデータベース徹底比較 by Komori
 
Sql server 2016 ctp 3.0 新機能
Sql server 2016 ctp 3.0 新機能Sql server 2016 ctp 3.0 新機能
Sql server 2016 ctp 3.0 新機能
 
SQL server 2016 New Features
SQL server 2016 New FeaturesSQL server 2016 New Features
SQL server 2016 New Features
 

Similar to Clustered Columnstore - Deep Dive

Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at nightMichael Yarichuk
 
Sql server scalability fundamentals
Sql server scalability fundamentalsSql server scalability fundamentals
Sql server scalability fundamentalsChris Adkin
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesYoshinori Matsunobu
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar storesIstvan Szukacs
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar storesIstvan Szukacs
 
SQL Server 2014 In-Memory Tables (XTP, Hekaton)
SQL Server 2014 In-Memory Tables (XTP, Hekaton)SQL Server 2014 In-Memory Tables (XTP, Hekaton)
SQL Server 2014 In-Memory Tables (XTP, Hekaton)Tony Rogerson
 
An introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeAn introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeChris Adkin
 
Data storage systems
Data storage systemsData storage systems
Data storage systemsdelimitry
 
DBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training PresentationsDBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training PresentationsSrinivas Mutyala
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramChris Adkin
 
Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)
Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)
Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)Kyle Davis
 
SQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedSQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedTony Rogerson
 
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...Marco Gralike
 
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...Amazon Web Services
 
SQL, Oracle, Joins
SQL, Oracle, JoinsSQL, Oracle, Joins
SQL, Oracle, JoinsGaurish Goel
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architectureAjeet Singh
 
Melhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftMelhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftAmazon Web Services LATAM
 

Similar to Clustered Columnstore - Deep Dive (20)

Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
 
Sql server scalability fundamentals
Sql server scalability fundamentalsSql server scalability fundamentals
Sql server scalability fundamentals
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
SQL Server 2014 In-Memory Tables (XTP, Hekaton)
SQL Server 2014 In-Memory Tables (XTP, Hekaton)SQL Server 2014 In-Memory Tables (XTP, Hekaton)
SQL Server 2014 In-Memory Tables (XTP, Hekaton)
 
An introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeAn introduction to column store indexes and batch mode
An introduction to column store indexes and batch mode
 
Data storage systems
Data storage systemsData storage systems
Data storage systems
 
DBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training PresentationsDBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training Presentations
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ram
 
Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)
Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)
Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)
 
SQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedSQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - Advanced
 
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
 
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
 
SQL, Oracle, Joins
SQL, Oracle, JoinsSQL, Oracle, Joins
SQL, Oracle, Joins
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architecture
 
Melhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftMelhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon Redshift
 

Recently uploaded

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 

Clustered Columnstore - Deep Dive

  • 1. Clustered Columnstore – Deep Dive Niko Neugebauer Barcelona, October 25th 2014
  • 2. Have you seen this guy ?
  • 4. Niko Neugebauer Microsoft Data Platform Professional OH22 (http://www.oh22.net) 15+ years in IT SQL Server MVP Founder of 3 Portuguese PASS Chapters Blog: http://www.nikoport.com Twitter: @NikoNeugebauer LinkedIn: http://pt.linkedin.com/in/webcaravela Email: info@webcaravela.com
  • 5. So this is a supposedly Deep Dive   My assumptions:  You have heard about Columnstore Indexes  You understand the difference between RowStore vs Columnstore  You know about Dictionary existance in Columnstore Indexes  You know how locking & blocking works (at least understand the S, SI, IX, X locks)  You have used DBCC Page functionality   You are crazy enough to believe that this topic could be expanded into this kind of level 
  • 6. Todays Plan • Intro (About Columnstore, Row Groups, Segments, Delta-Stores) • Batch Mode • Compresson phases • Dictionaries • Materialisation • Meta-informations • DBCC (Nope ) • Locking & Blocking (Hopefully ) • Bulk Load (Nope ) • Tuple Mover (Nope )
  • 7. About Columnstore Indexes:  Reading Fact tables  Reading Big Dimension tables  Very low-activity big OLTP tables, which are scanned & processed almost entirely  Data Warehouses  Decision Support Applications  Business Intelligence Applications
  • 8. Clustered Columnstore in SQL Server 2014  Delta-Stores (open & close)  Deleted Bitmap  Delete & Update work as a DELETE + INSERT
  • 10. Batch Mode  New Model of Data Processing  Query execution by using GetNext (), which delivers data to the CPU (In its turn it will go down the stack and get physical accecss to data. Every operator behaves this way, which makes GetNext() a virtuall function. For execution plans sometimes you will have 100s of this function invocation before you will get actual 1 row. For OLTP it might be a good idea, since we are working just with few rows, but If you are working with millions of rows (in BI or DW) you will make billions of such invocations.  Entering Batch Mode, which actually invokes data for processing not 1 by 1 but in Batches of ~900 rows  This might bring benefits in 10s & 100s times
  • 11. Batch Mode  In Batch Mode every operator down the stack have to play the same game, passing the same amount of data -> Row Mode can’t interact with Batch Mode.  64 row vs 900 rows (Progammers, its like passing an Array vs 1 by 1 param )  Works exclusively for Columnstore Indexes  Works exclusively for parallel plans, hence MAXDOP >= 2  Think about it as if it would be a Factory processing vs Manual Processing (19th vs 18th Century)
  • 12. Batch Mode is fragile  Not every operator is implemented in Batch Mode.  Examples of Row Mode operators: Sort, Exchange, Inner LOOP, Merge, ...  Any disturbance in the force will make Batch Execution Mode to fall down into Row Execution Mode, for example lack of memory.  SQL Server 2014 introduces so-called “Mixed Mode”, where execution plan operators in Row Mode can co-exist with Batch Mode operators
  • 13. Batch Mode Deep Dive  Optimized for 64 bit values of the register  Late materialization (working on compressed values)  Batch Size is optimized to work in L2 Cache with idea of avoiding Cache Misses
  • 14. Latency Cache  L1 cache reference – 0.5 ns  L2 cache reference – 7.0 ns (14 times slower)  L3 cache reference – 28.0 ns (4 times slower)  L3 cache reference (outside NUMA) – 42.0 ns (6 times slower)  Main memory reference – 100ns (3 times slower)  Read 1 MB sequentially from memory – 250.000 ns (5.000 times L1 Cache)
  • 16. SQL Server 2014 Batch Mode • All execution improvements are done for Nonclustered & Clustered Columnstores • Mixed Mode – Row & Batch mode can co-exist • OUTER JOIN, UNION ALL, EXIST, IN, Scalar Aggregates, Distinct Aggregates – all work in Batch Mode • Some TempDB operations for Columnstore Indexes are running in Batch mode. (TempDB Spill)
  • 18. Basics phases of the Columnstore Indexes creation: 1. Row Groups separation 2. Segment creation 3. Compression
  • 19. 1. Row Groups creation ~ 1 Million Rows ~ 1 Million Rows ~ 1 Million Rows ~ 1 Million Rows } } } }
  • 21. 3. Compression (involves reordering, compression & LOB conversion) 3 C 2 B 4 D 1 A 1 A ... ... 2 B ... ... 3 C ... ... 4 D ... ... ... ... ... ... ... . . . . ..
  • 22. Columnstore compression steps (when applicable) • Value Scale • Bit-Array • Run-length compression • Dictionary encoding • Huffman encoding • Binary compression
  • 24. Bit Array Name Mark Andre John Mark John Andre John Mark Mark Andre John 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 1 1 0 0
  • 26. Run-length compression, more complex scenario Name Last Name Mark Simpson Mark Donalds John Simpson Andre White Andre Donalds Andre Simpson Ricardo Simpson Mark Simpson Charlie Simpson Mark White Charlie Donalds Name Last Name Mark:4 Simpson:1 John:1 Donalds:1 Andre:3 Simpson:1 Ricardo:1 White:1 Charlie:2 Simpson:1 White:1 Donalds:1 Simpson:3 Donalds:1 Name Last Name Mark Simpson Mark Donalds Mark Simpson Mark White John Simpson Andre White Andre Donalds Andre Simpson Ricardo Simpson Charlie Simpson Charlie Donalds
  • 27. Run-length compression, more complex scenario, part 2 Name Last Name Mark Simpson Mark Donalds John Simpson Andre White Andre Donalds Andre Simpson Ricardo Simpson Mark Simpson Charlie Simpson Mark White Charlie Donalds Name Last Name Andre:1 Donalds:3 Charlie:1 Simpson:6 Mark:3 White:3 Andre:1 Ricarod:1 John:1 Charlie:1 Andre:1 Mark:1 Name Last Name Andre Donalds Charlie Donalds Mark Donalds Mark Simpson Mark Simpson Andre Simpson Ricardo Simpson John Simpson Charlie Simpson Andre White Mark White
  • 29. Huffman enconding (aka ASCII encoding) Name Count Code Mark 4 001 Andre 3 010 Charlie 2 011 John 1 100 Ricardo 1 101  Fairly efficient ~ N log (N)  Design a Huffman code in linear time if input probabilities (aka weights) are sorted. Name Last Name Mark Simpson Mark Donalds Mark Simpson Mark White John Simpson Andre White Andre Donalds Andre Simpson Ricardo Simpson Charlie Simpson Charlie Donalds
  • 31. Binary Compression  Super-secret Vertipac aka xVelocity compression turning data into LOBs.   LOBs are stored by using traditional storage mechanisms (8K pages & extents)
  • 33. Columnstore Archival Compression  One more compression level  Applied over the xVelocity compression  It is a slight modification of LZ77 (aka Zip) New!
  • 34. Compression Recap: Determination of the best algorithm is the principal key for the success for the X-Velocity. This process includes data shuffling between segments and different methods of compression. Every segment has different data, and so different algorithms with different success are being applied. If you are seeing a lot of queries including a predicate on a certain column, then try creating a traditional clustered index on it (sorting) and then create a columnstore. Every compression is supported on the partition level
  • 38. Dictionaries  Global dictionaries, contain entries for each and every of the existing segments of the same column storage  Local dictionaries, contain entries for 1 or more segments of the same column storage  Sizes varies from 56 bytes (min) to 16 MB (max)  There is a specialized view which provides information on the dictionaries, such as entries count, size, etc - sys.column_store_dictionaries  Undocumented feature which potentially allow us to consult the content of the dictionaries (will see it later)  No all columns will use dictionaries
  • 40. Let’s execute a query: select name, count(*) from dbo.SampleTable group by name order by count(*) desc;
  • 42. Materialisation Process Name Andre Miguel Sofia Joana Andre Sofia Vitor Paulo Joana Miguel Paulo Paulo Name Value Andre 1 Joana 2 Miguel 3 Paulo 4 Sofia 5 Vitor 6 Compressed 1 3 5 2 1 5 6 4 2 3 4 4
  • 43. Select, Count, Group, Sort Compressed 1 3 5 2 1 5 6 4 2 3 4 4 Item Count 1 2 3 2 5 2 2 2 6 1 4 3 Item Count 4 3 3 2 5 2 2 2 1 2 6 1 Name Count Andre 3 Joana 2 Miguel 2 Paulo 2 Sofia 2 Vitor 1
  • 44. Meta-information • sys.column_store_dictionaries – SQL Server 2012 • sys.column_store_segments – SQL Server 2012 • sys.column_store_row_groups – SQL Server 2014
  • 45. DBCC CSINDEX Never try this on anything else besides your own test PC
  • 46. DBCC CSINDEX DBCC CSIndex ( {'dbname' | dbid}, rowsetid, --HoBT or PartitionID columnid, -- Column_id from sys.column_store_segments rowgroupid, -- segment_id from sys.column_store_segments object_type, -- 1 (Segment), 2 (Dictionary), print_option -- [0 or 1 or 2] [, start] [, end] )
  • 48. Columnstore elements:  Row  Column  Row Group  Segment  Delta-Store  Deleted Bitmap But lock is placed on Row Group/Delta-Store level
  • 50. BULK Load  A process completely apart  102.400 is a magic number which gives you a Segment instead of a Delta-Store  For data load, if you order your loaded data into chunks of 1.045.678 rows for loading – your Columnstore will be almost perfect 
  • 52. Memory Management  Columnstore Indexes consume A LOT of memory  Columnstore Object Pool – new special pool in SQL 2012+  New Memory Brocker which divides memory between Row Store & Column Store
  • 53. Memory Management  Memory Grant for Index Creation in MB = ( 4.2 * Cols_Count + 68 ) * DOP + String_Cols * 34 (2012 Formula)  When not enough memory granted, you might need to change Resource Governor limits for the respective group (here setting max percent grant to 50%): ALTER WORKLOAD GROUP [DEFAULT] WITH (REQUEST_MAX_MEMORY_GRANT_PERCENT=50); GO ALTER RESOURCE GOVERNOR RECONFIGURE GO  Memory Management is automatic, so when you have not enough memory – then the DOP will be lowered automatically until 2, so the memory consumption will be lowered.
  • 57. Update vs Delete + Insert
  • 63. Links: My blog series on Columnstore Indexes (39+ Blogposts):  http://www.nikoport.com/columnstore/ Remus Rusanu Introduction for Clustered Columnstore:  http://rusanu.com/2013/06/11/sql-server-clustered-columnstore- indexes-at-teched-2013/ White Paper on the Clustered Columnstore:  http://research.microsoft.com/pubs/193599/Apollo3%20- %20Sigmod%202013%20-%20final.pdf