SlideShare a Scribd company logo
1 of 27
COLUMNSTORE INDEXES

SQL Server 2012

 Denis Reznik
 The Frayman Group
 denisreznik@live.ru
Columnstore indexes
• Column Store vs. Row Store
• Columnstore benefits
• Columnstore indexes
• CS indexes Internals
• Adding data to Columnstore index
Row Store and Column Store




 In row store, data is stored tuple by tuple.
 In column store, data is stored column by column

                                                    3
Row Store and Column Store
                                        name address

 Most of the queries does not      id                  city   state   age
 process all the attributes of a
 particular relation.



 SELECT c.name, c.address
 FROM Customers c
 WHERE c.region = ‘Moskow'


                                                                            4
Row Store and Column Store

Row Store                            Column Store

(+) Easy to add/modify a record      (+) Only need to read in relevant data

(-) Might read in unnecessary data   (-) Tuple writes require multiple accesses




   So column stores are suitable for read-mostly, read-intensive,
   large data repositories
                                                                                  5
Compression

 Trades I/O for CPU
    Higher data value locality in column stores
    Techniques such as run length encoding far more useful
 Schemes
    Null Suppression
    Dictionary encoding
    Run Length encoding
    Bit-Vector encoding
    Heavyweight schemes

                                                             6
Columnar storage structure



                C1   C2   C3   C4   C5   C6




Uses VertiPaq
compression
Accelerating Data Warehouse
Queries with SQL Server 2012
                  v            9
Columnstore Indexes
Improved Data Warehouse Query performance


  Columnstore indexes provide an
  easy way to significantly improve
  data warehouse and decision
  support query performance against
  very large data sets
  Performance improvements for
  “typical” data warehouse queries
  from 10x to 100x
  Ideal candidates include queries
  against star schemas that use
  filtering, aggregations and grouping
  against very large fact tables
                                            10
Good Candidates for Columnstore
Indexing
 Table candidates:
    Very large fact tables (for example – billions of rows)
    Larger dimension tables (millions of rows) with compression friendly column
    data
    If unsure, it is easy to create a columnstore index and test the impact on
    your query workload
 Query candidates (against table with a columnstore index):
    Scan versus seek (columnstore indexes don’t support seek operations)
    Aggregated results far smaller than table size
    Joins to smaller dimension tables
    Filtering on fact / dimension tables – star schema pattern
    Sub-set of columns (being selective in columns versus returning ALL
    columns)                                                                      11
Creating a columnstore index

T-SQL




SSMS




                               12
Defining the Columnstore Index

Base
                     OR
                                         Columnstore index is nonclustered
table                                    (secondary)
         Clustered        Heap
           index                         Base table can be clustered index or heap
                                         One CS index per table
                                         Multiple other nonclustered (B-tree)
Nonclustered Nonclustered Nonclustered
   index        index     columnstore    indexes allowed
                             index
                                            But may not be needed
                                         CS index must be partition-aligned if table
                                         is partitioned
      Indexed             Filtered
        view               index
segment 1
Column Segments and
Dictionaries
C1   C2   C3   C4   C5   C6


                              Set of about
                              1M rows

                                               …         dictionaries
                                             segment N


                                 Column
                                 Segment


                                                                    15
Memory management

•      Memory management is automatic
•      Columnstore is persisted on disk
•      Needed columns fetched into memory
           •      Columnstore segments flow between disk and memory
                                              SELECT C2,
                                              SUM(C4)
T.C1              T.C2 T.C3            T.C4   FROM T                  T.C4
                                                               T.C2
           T.C1
                                T.C3
                                              GROUP BY C2;
    T.C1          T.C2 T.C3            T.C4
           T.C1                                         T.C4
                                                 T.C2
                                T.C3
    T.C1                 T.C3          T.C4
                  T.C2


                                                                             16
Look inside Columnstore Indexes
               v                  17
Xvelocity
 Microsoft SQL Server family of memory-optimized and
 in-memory technologies
    xVelocity In-Memory Analytics Engine
    xVelocity Memory-Optimized Columnstore Indexes




 The xVelocity engine is designed with 3 principles in
 mind:
    Performance, Performance, Performance!               18
How Are These Performance Gains
Achieved?
 Two complimentary technologies:
   Storage
      Data is stored in a compressed columnar data format (stored
      by column) instead of row store format (stored by row).
   New “batch mode” execution
      Vector-based query execution capability
      Data can then be processed in batches versus row-by-row
      Depending on filtering and other factors, a query may also
      benefit by “segment elimination” - bypassing million row
      chunks (segments) of data, further reducing I/O               19
Batch mode processing
                             Batch object
                                               Process ~1000 rows at
                              Column vectors
                                               a time
 bitmap of qualifying rows




                                               Vector operators
                                               implemented
                                               Greatly reduced CPU
                                               time (7 to 40X)
Segment Elimination



                                                 select Date, count(*)
                                                 from dbo.Purchase
                                                 where Date >= 20120201
 column_i                                        group by Date
            segment_id min_data_id max_data_id
 d

 1          1          20120101    20120131

 1          2          20120115    20120215

 1          3          20120201    20120228
Columnstore format + batch mode
Variations
   Columnstore indexing alone + traditional row mode in
   Query Processor
   Columnstore indexing + batch mode in Query
   Processor
   Columnstore indexing + hybrid of batch and traditional
   row mode in Query Processor




                                                            23
Plan operators supported in batch mode

 Filter
 Project
 Scan
 Local hash (partial) aggregation
 Hash inner join
 (Batch) hash table build
                                         24
Query processing with
Columnstore Indexes
               v        25
Maintaining Data in a Columnstore Index

 Once built, the table becomes “read-only” and
 INSERT/UPDATE/DELETE/MERGE is no longer
 allowed
 ALTER INDEX REBUILD / REORGANIZE not
 allowed
 How can I modify index data?
   Drop columnstore index / make modifications / add
   columnstore index
   UNION ALL (but be sure to validate performance)
   Partition switches (IN and OUT)                     27
Insert data into table with
Columnstore Index v           28
Summary


 SQL Server 2012 offers significantly faster query performance
 for data warehouse and decision support scenarios
    10x to 100x performance improvement depending on the schema
    and query
        I/O reduction and memory savings through columnstore compressed
        storage
        CPU reduction with batch versus row processing, further I/O reduction if
        segmentation elimination occurs
    Easy to deploy and requires less management than some legacy
    ROLAP or OLAP methods
        No need to create intermediate tables, aggregates, pre-processing and
        cubes
    Interoperability with partitioning
                                                                                   29
Resources


  Columnar Storage in SQL Server 2012 (PDF)
  SQL Server Columnstore Performance Tuning
  Inside the SQL Server 2012 Columnstore Indexes
  24 HOP Russia 2013 – Dmitry Pilyugin (video - rus)
  SQL Server Columnstore Performance Tuning (video)




                                                       30
SQL SERVER 2012 - COLUMNSTORE INDEXES




 Denis Reznik

 Senior Database Architect at The Frayman Group

 Microsoft SQL Server MVP

 denisreznik@live.ru

 @denisreznik

 http://reznik.uneta.com.ua

More Related Content

Viewers also liked

TSQL in SQL Server 2012
TSQL in SQL Server 2012TSQL in SQL Server 2012
TSQL in SQL Server 2012Eduardo Castro
 
Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Chris Adkin
 
활용예시를 통한 Sql server 2012의 향상된 프로그래밍 기능 엿보기
활용예시를 통한 Sql server 2012의 향상된 프로그래밍 기능 엿보기활용예시를 통한 Sql server 2012의 향상된 프로그래밍 기능 엿보기
활용예시를 통한 Sql server 2012의 향상된 프로그래밍 기능 엿보기cranbe95
 
2011년 KTH H3 컨퍼런스 Track B, 세션4 "Advanced Git" by A.J
2011년 KTH H3 컨퍼런스 Track B, 세션4 "Advanced Git" by A.J2011년 KTH H3 컨퍼런스 Track B, 세션4 "Advanced Git" by A.J
2011년 KTH H3 컨퍼런스 Track B, 세션4 "Advanced Git" by A.JAndrew J. Kim
 
[db tech showcase Tokyo 2015] B15:最新PostgreSQLはパフォーマンスが飛躍的に向上する!? - PostgreSQ...
[db tech showcase Tokyo 2015] B15:最新PostgreSQLはパフォーマンスが飛躍的に向上する!? - PostgreSQ...[db tech showcase Tokyo 2015] B15:最新PostgreSQLはパフォーマンスが飛躍的に向上する!? - PostgreSQ...
[db tech showcase Tokyo 2015] B15:最新PostgreSQLはパフォーマンスが飛躍的に向上する!? - PostgreSQ...Insight Technology, Inc.
 
[db tech showcase Tokyo 2015] A14:Amazon Redshiftの元となったスケールアウト型カラムナーDB徹底解説 その...
[db tech showcase Tokyo 2015] A14:Amazon Redshiftの元となったスケールアウト型カラムナーDB徹底解説 その...[db tech showcase Tokyo 2015] A14:Amazon Redshiftの元となったスケールアウト型カラムナーDB徹底解説 その...
[db tech showcase Tokyo 2015] A14:Amazon Redshiftの元となったスケールアウト型カラムナーDB徹底解説 その...Insight Technology, Inc.
 
Columnstore indexes in sql server 2014
Columnstore indexes in sql server 2014Columnstore indexes in sql server 2014
Columnstore indexes in sql server 2014Antonios Chatzipavlis
 
SQL 2016 Mejoras en InMemory OLTP y Column Store Index
SQL 2016 Mejoras en InMemory OLTP y Column Store IndexSQL 2016 Mejoras en InMemory OLTP y Column Store Index
SQL 2016 Mejoras en InMemory OLTP y Column Store IndexEduardo Castro
 
SQL Server 2016 novelties
SQL Server 2016 noveltiesSQL Server 2016 novelties
SQL Server 2016 noveltiesMSDEVMTL
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017LinkedIn
 

Viewers also liked (13)

TSQL in SQL Server 2012
TSQL in SQL Server 2012TSQL in SQL Server 2012
TSQL in SQL Server 2012
 
Indian movies games
Indian movies gamesIndian movies games
Indian movies games
 
Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)
 
활용예시를 통한 Sql server 2012의 향상된 프로그래밍 기능 엿보기
활용예시를 통한 Sql server 2012의 향상된 프로그래밍 기능 엿보기활용예시를 통한 Sql server 2012의 향상된 프로그래밍 기능 엿보기
활용예시를 통한 Sql server 2012의 향상된 프로그래밍 기능 엿보기
 
2011년 KTH H3 컨퍼런스 Track B, 세션4 "Advanced Git" by A.J
2011년 KTH H3 컨퍼런스 Track B, 세션4 "Advanced Git" by A.J2011년 KTH H3 컨퍼런스 Track B, 세션4 "Advanced Git" by A.J
2011년 KTH H3 컨퍼런스 Track B, 세션4 "Advanced Git" by A.J
 
3 indexes
3 indexes3 indexes
3 indexes
 
[db tech showcase Tokyo 2015] B15:最新PostgreSQLはパフォーマンスが飛躍的に向上する!? - PostgreSQ...
[db tech showcase Tokyo 2015] B15:最新PostgreSQLはパフォーマンスが飛躍的に向上する!? - PostgreSQ...[db tech showcase Tokyo 2015] B15:最新PostgreSQLはパフォーマンスが飛躍的に向上する!? - PostgreSQ...
[db tech showcase Tokyo 2015] B15:最新PostgreSQLはパフォーマンスが飛躍的に向上する!? - PostgreSQ...
 
[db tech showcase Tokyo 2015] A14:Amazon Redshiftの元となったスケールアウト型カラムナーDB徹底解説 その...
[db tech showcase Tokyo 2015] A14:Amazon Redshiftの元となったスケールアウト型カラムナーDB徹底解説 その...[db tech showcase Tokyo 2015] A14:Amazon Redshiftの元となったスケールアウト型カラムナーDB徹底解説 その...
[db tech showcase Tokyo 2015] A14:Amazon Redshiftの元となったスケールアウト型カラムナーDB徹底解説 その...
 
Columnstore indexes in sql server 2014
Columnstore indexes in sql server 2014Columnstore indexes in sql server 2014
Columnstore indexes in sql server 2014
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
 
SQL 2016 Mejoras en InMemory OLTP y Column Store Index
SQL 2016 Mejoras en InMemory OLTP y Column Store IndexSQL 2016 Mejoras en InMemory OLTP y Column Store Index
SQL 2016 Mejoras en InMemory OLTP y Column Store Index
 
SQL Server 2016 novelties
SQL Server 2016 noveltiesSQL Server 2016 novelties
SQL Server 2016 novelties
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017
 

Similar to Improve Data Warehouse Query Performance up to 100x with SQL Server 2012 Columnstore Indexes

SQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoreSQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoresqlserver.co.il
 
Database Performance
Database PerformanceDatabase Performance
Database PerformanceBoris Hristov
 
SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2sqlserver.co.il
 
Oracle Database InMemory
Oracle Database InMemoryOracle Database InMemory
Oracle Database InMemoryJorge Barba
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Lviv Startup Club
 
05 Create and Maintain Databases and Tables.pptx
05 Create and Maintain Databases and Tables.pptx05 Create and Maintain Databases and Tables.pptx
05 Create and Maintain Databases and Tables.pptxMohamedNowfeek1
 
NewSQL Database Overview
NewSQL Database OverviewNewSQL Database Overview
NewSQL Database OverviewSteve Min
 
An introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeAn introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeChris Adkin
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
 
Clustered Columnstore - Deep Dive
Clustered Columnstore - Deep DiveClustered Columnstore - Deep Dive
Clustered Columnstore - Deep DiveNiko Neugebauer
 
C-Store-s553-stonebraker.ppt
C-Store-s553-stonebraker.pptC-Store-s553-stonebraker.ppt
C-Store-s553-stonebraker.pptJinwenZhong1
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStoreMariaDB plc
 

Similar to Improve Data Warehouse Query Performance up to 100x with SQL Server 2012 Columnstore Indexes (20)

SQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoreSQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStore
 
Database Performance
Database PerformanceDatabase Performance
Database Performance
 
SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2
 
מיכאל
מיכאלמיכאל
מיכאל
 
Tunning overview
Tunning overviewTunning overview
Tunning overview
 
Lecture3.ppt
Lecture3.pptLecture3.ppt
Lecture3.ppt
 
Oracle Database InMemory
Oracle Database InMemoryOracle Database InMemory
Oracle Database InMemory
 
Database Sizing
Database SizingDatabase Sizing
Database Sizing
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
 
05 Create and Maintain Databases and Tables.pptx
05 Create and Maintain Databases and Tables.pptx05 Create and Maintain Databases and Tables.pptx
05 Create and Maintain Databases and Tables.pptx
 
NewSQL Database Overview
NewSQL Database OverviewNewSQL Database Overview
NewSQL Database Overview
 
An introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeAn introduction to column store indexes and batch mode
An introduction to column store indexes and batch mode
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Clustered Columnstore - Deep Dive
Clustered Columnstore - Deep DiveClustered Columnstore - Deep Dive
Clustered Columnstore - Deep Dive
 
Vertica
VerticaVertica
Vertica
 
C-Store-s553-stonebraker.ppt
C-Store-s553-stonebraker.pptC-Store-s553-stonebraker.ppt
C-Store-s553-stonebraker.ppt
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
 

More from Денис Резник

MS DevDay - SQLServer 2014 for Developers
MS DevDay - SQLServer 2014 for DevelopersMS DevDay - SQLServer 2014 for Developers
MS DevDay - SQLServer 2014 for DevelopersДенис Резник
 
TechEd 2012 - Сценарии хранения и обработки данных в windows azure
TechEd 2012 - Сценарии хранения и обработки данных в windows azureTechEd 2012 - Сценарии хранения и обработки данных в windows azure
TechEd 2012 - Сценарии хранения и обработки данных в windows azureДенис Резник
 
MS Swit 2012 - Windows 8 Application Lifecycle
MS Swit 2012 - Windows 8 Application LifecycleMS Swit 2012 - Windows 8 Application Lifecycle
MS Swit 2012 - Windows 8 Application LifecycleДенис Резник
 
Масштабирование в SQL Azure - SQL Azure Federations
Масштабирование в SQL Azure - SQL Azure FederationsМасштабирование в SQL Azure - SQL Azure Federations
Масштабирование в SQL Azure - SQL Azure FederationsДенис Резник
 
Масштабирование в SQL Azure - SQL Azure Federations
Масштабирование в SQL Azure - SQL Azure FederationsМасштабирование в SQL Azure - SQL Azure Federations
Масштабирование в SQL Azure - SQL Azure FederationsДенис Резник
 
начинаем работать с Sql server compact
начинаем работать с Sql server compactначинаем работать с Sql server compact
начинаем работать с Sql server compactДенис Резник
 
масштабирование в Sql azure
масштабирование в Sql azureмасштабирование в Sql azure
масштабирование в Sql azureДенис Резник
 
SQL Server StreamIinsight - data processing in real time
SQL Server StreamIinsight - data processing in real timeSQL Server StreamIinsight - data processing in real time
SQL Server StreamIinsight - data processing in real timeДенис Резник
 

More from Денис Резник (19)

iForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQLiForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQL
 
MS DevDay - SQLServer 2014 for Developers
MS DevDay - SQLServer 2014 for DevelopersMS DevDay - SQLServer 2014 for Developers
MS DevDay - SQLServer 2014 for Developers
 
SqlSaturday199 - Deadlocks
SqlSaturday199 - DeadlocksSqlSaturday199 - Deadlocks
SqlSaturday199 - Deadlocks
 
SQL Server 2012 Deep Dive (rus)
SQL Server 2012 Deep Dive (rus)SQL Server 2012 Deep Dive (rus)
SQL Server 2012 Deep Dive (rus)
 
24 hop - Deadlocks
24 hop - Deadlocks24 hop - Deadlocks
24 hop - Deadlocks
 
TechEd 2012 - Сценарии хранения и обработки данных в windows azure
TechEd 2012 - Сценарии хранения и обработки данных в windows azureTechEd 2012 - Сценарии хранения и обработки данных в windows azure
TechEd 2012 - Сценарии хранения и обработки данных в windows azure
 
MS Swit 2012 - Windows 8 Application Lifecycle
MS Swit 2012 - Windows 8 Application LifecycleMS Swit 2012 - Windows 8 Application Lifecycle
MS Swit 2012 - Windows 8 Application Lifecycle
 
MS Swit 2012 - SQL Server 2012
MS Swit 2012 - SQL Server 2012MS Swit 2012 - SQL Server 2012
MS Swit 2012 - SQL Server 2012
 
Масштабирование в SQL Azure - SQL Azure Federations
Масштабирование в SQL Azure - SQL Azure FederationsМасштабирование в SQL Azure - SQL Azure Federations
Масштабирование в SQL Azure - SQL Azure Federations
 
Масштабирование в SQL Azure - SQL Azure Federations
Масштабирование в SQL Azure - SQL Azure FederationsМасштабирование в SQL Azure - SQL Azure Federations
Масштабирование в SQL Azure - SQL Azure Federations
 
SQL Server Denali
SQL Server DenaliSQL Server Denali
SQL Server Denali
 
Sql azure и все, все, все...
Sql azure и все, все, все...Sql azure и все, все, все...
Sql azure и все, все, все...
 
начинаем работать с Sql server compact
начинаем работать с Sql server compactначинаем работать с Sql server compact
начинаем работать с Sql server compact
 
Sql server 2011
Sql server 2011Sql server 2011
Sql server 2011
 
MS Swit 2010
MS Swit 2010MS Swit 2010
MS Swit 2010
 
масштабирование в Sql azure
масштабирование в Sql azureмасштабирование в Sql azure
масштабирование в Sql azure
 
WebMatrix
WebMatrixWebMatrix
WebMatrix
 
ASP.NET MVC 3
ASP.NET MVC 3ASP.NET MVC 3
ASP.NET MVC 3
 
SQL Server StreamIinsight - data processing in real time
SQL Server StreamIinsight - data processing in real timeSQL Server StreamIinsight - data processing in real time
SQL Server StreamIinsight - data processing in real time
 

Improve Data Warehouse Query Performance up to 100x with SQL Server 2012 Columnstore Indexes

  • 1. COLUMNSTORE INDEXES SQL Server 2012 Denis Reznik The Frayman Group denisreznik@live.ru
  • 2. Columnstore indexes • Column Store vs. Row Store • Columnstore benefits • Columnstore indexes • CS indexes Internals • Adding data to Columnstore index
  • 3. Row Store and Column Store In row store, data is stored tuple by tuple. In column store, data is stored column by column 3
  • 4. Row Store and Column Store name address Most of the queries does not id city state age process all the attributes of a particular relation. SELECT c.name, c.address FROM Customers c WHERE c.region = ‘Moskow' 4
  • 5. Row Store and Column Store Row Store Column Store (+) Easy to add/modify a record (+) Only need to read in relevant data (-) Might read in unnecessary data (-) Tuple writes require multiple accesses So column stores are suitable for read-mostly, read-intensive, large data repositories 5
  • 6. Compression Trades I/O for CPU Higher data value locality in column stores Techniques such as run length encoding far more useful Schemes Null Suppression Dictionary encoding Run Length encoding Bit-Vector encoding Heavyweight schemes 6
  • 7. Columnar storage structure C1 C2 C3 C4 C5 C6 Uses VertiPaq compression
  • 8. Accelerating Data Warehouse Queries with SQL Server 2012 v 9 Columnstore Indexes
  • 9. Improved Data Warehouse Query performance Columnstore indexes provide an easy way to significantly improve data warehouse and decision support query performance against very large data sets Performance improvements for “typical” data warehouse queries from 10x to 100x Ideal candidates include queries against star schemas that use filtering, aggregations and grouping against very large fact tables 10
  • 10. Good Candidates for Columnstore Indexing Table candidates: Very large fact tables (for example – billions of rows) Larger dimension tables (millions of rows) with compression friendly column data If unsure, it is easy to create a columnstore index and test the impact on your query workload Query candidates (against table with a columnstore index): Scan versus seek (columnstore indexes don’t support seek operations) Aggregated results far smaller than table size Joins to smaller dimension tables Filtering on fact / dimension tables – star schema pattern Sub-set of columns (being selective in columns versus returning ALL columns) 11
  • 11. Creating a columnstore index T-SQL SSMS 12
  • 12. Defining the Columnstore Index Base OR Columnstore index is nonclustered table (secondary) Clustered Heap index Base table can be clustered index or heap One CS index per table Multiple other nonclustered (B-tree) Nonclustered Nonclustered Nonclustered index index columnstore indexes allowed index But may not be needed CS index must be partition-aligned if table is partitioned Indexed Filtered view index
  • 13. segment 1 Column Segments and Dictionaries C1 C2 C3 C4 C5 C6 Set of about 1M rows … dictionaries segment N Column Segment 15
  • 14. Memory management • Memory management is automatic • Columnstore is persisted on disk • Needed columns fetched into memory • Columnstore segments flow between disk and memory SELECT C2, SUM(C4) T.C1 T.C2 T.C3 T.C4 FROM T T.C4 T.C2 T.C1 T.C3 GROUP BY C2; T.C1 T.C2 T.C3 T.C4 T.C1 T.C4 T.C2 T.C3 T.C1 T.C3 T.C4 T.C2 16
  • 15. Look inside Columnstore Indexes v 17
  • 16. Xvelocity Microsoft SQL Server family of memory-optimized and in-memory technologies xVelocity In-Memory Analytics Engine xVelocity Memory-Optimized Columnstore Indexes The xVelocity engine is designed with 3 principles in mind: Performance, Performance, Performance! 18
  • 17. How Are These Performance Gains Achieved? Two complimentary technologies: Storage Data is stored in a compressed columnar data format (stored by column) instead of row store format (stored by row). New “batch mode” execution Vector-based query execution capability Data can then be processed in batches versus row-by-row Depending on filtering and other factors, a query may also benefit by “segment elimination” - bypassing million row chunks (segments) of data, further reducing I/O 19
  • 18. Batch mode processing Batch object Process ~1000 rows at Column vectors a time bitmap of qualifying rows Vector operators implemented Greatly reduced CPU time (7 to 40X)
  • 19. Segment Elimination select Date, count(*) from dbo.Purchase where Date >= 20120201 column_i group by Date segment_id min_data_id max_data_id d 1 1 20120101 20120131 1 2 20120115 20120215 1 3 20120201 20120228
  • 20. Columnstore format + batch mode Variations Columnstore indexing alone + traditional row mode in Query Processor Columnstore indexing + batch mode in Query Processor Columnstore indexing + hybrid of batch and traditional row mode in Query Processor 23
  • 21. Plan operators supported in batch mode Filter Project Scan Local hash (partial) aggregation Hash inner join (Batch) hash table build 24
  • 23. Maintaining Data in a Columnstore Index Once built, the table becomes “read-only” and INSERT/UPDATE/DELETE/MERGE is no longer allowed ALTER INDEX REBUILD / REORGANIZE not allowed How can I modify index data? Drop columnstore index / make modifications / add columnstore index UNION ALL (but be sure to validate performance) Partition switches (IN and OUT) 27
  • 24. Insert data into table with Columnstore Index v 28
  • 25. Summary SQL Server 2012 offers significantly faster query performance for data warehouse and decision support scenarios 10x to 100x performance improvement depending on the schema and query I/O reduction and memory savings through columnstore compressed storage CPU reduction with batch versus row processing, further I/O reduction if segmentation elimination occurs Easy to deploy and requires less management than some legacy ROLAP or OLAP methods No need to create intermediate tables, aggregates, pre-processing and cubes Interoperability with partitioning 29
  • 26. Resources Columnar Storage in SQL Server 2012 (PDF) SQL Server Columnstore Performance Tuning Inside the SQL Server 2012 Columnstore Indexes 24 HOP Russia 2013 – Dmitry Pilyugin (video - rus) SQL Server Columnstore Performance Tuning (video) 30
  • 27. SQL SERVER 2012 - COLUMNSTORE INDEXES Denis Reznik Senior Database Architect at The Frayman Group Microsoft SQL Server MVP denisreznik@live.ru @denisreznik http://reznik.uneta.com.ua