COLUMNSTORE INDEXES

SQL Server 2012

 Denis Reznik
 The Frayman Group
 denisreznik@live.ru
Columnstore indexes
• Column Store vs. Row Store
• Columnstore benefits
• Columnstore indexes
• CS indexes Internals
• Adding data to Columnstore index
Row Store and Column Store




 In row store, data is stored tuple by tuple.
 In column store, data is stored column by column

                                                    3
Row Store and Column Store
                                        name address

 Most of the queries does not      id                  city   state   age
 process all the attributes of a
 particular relation.



 SELECT c.name, c.address
 FROM Customers c
 WHERE c.region = ‘Moskow'


                                                                            4
Row Store and Column Store

Row Store                            Column Store

(+) Easy to add/modify a record      (+) Only need to read in relevant data

(-) Might read in unnecessary data   (-) Tuple writes require multiple accesses




   So column stores are suitable for read-mostly, read-intensive,
   large data repositories
                                                                                  5
Compression

 Trades I/O for CPU
    Higher data value locality in column stores
    Techniques such as run length encoding far more useful
 Schemes
    Null Suppression
    Dictionary encoding
    Run Length encoding
    Bit-Vector encoding
    Heavyweight schemes

                                                             6
Columnar storage structure



                C1   C2   C3   C4   C5   C6




Uses VertiPaq
compression
Accelerating Data Warehouse
Queries with SQL Server 2012
                  v            9
Columnstore Indexes
Improved Data Warehouse Query performance


  Columnstore indexes provide an
  easy way to significantly improve
  data warehouse and decision
  support query performance against
  very large data sets
  Performance improvements for
  “typical” data warehouse queries
  from 10x to 100x
  Ideal candidates include queries
  against star schemas that use
  filtering, aggregations and grouping
  against very large fact tables
                                            10
Good Candidates for Columnstore
Indexing
 Table candidates:
    Very large fact tables (for example – billions of rows)
    Larger dimension tables (millions of rows) with compression friendly column
    data
    If unsure, it is easy to create a columnstore index and test the impact on
    your query workload
 Query candidates (against table with a columnstore index):
    Scan versus seek (columnstore indexes don’t support seek operations)
    Aggregated results far smaller than table size
    Joins to smaller dimension tables
    Filtering on fact / dimension tables – star schema pattern
    Sub-set of columns (being selective in columns versus returning ALL
    columns)                                                                      11
Creating a columnstore index

T-SQL




SSMS




                               12
Defining the Columnstore Index

Base
                     OR
                                         Columnstore index is nonclustered
table                                    (secondary)
         Clustered        Heap
           index                         Base table can be clustered index or heap
                                         One CS index per table
                                         Multiple other nonclustered (B-tree)
Nonclustered Nonclustered Nonclustered
   index        index     columnstore    indexes allowed
                             index
                                            But may not be needed
                                         CS index must be partition-aligned if table
                                         is partitioned
      Indexed             Filtered
        view               index
segment 1
Column Segments and
Dictionaries
C1   C2   C3   C4   C5   C6


                              Set of about
                              1M rows

                                               …         dictionaries
                                             segment N


                                 Column
                                 Segment


                                                                    15
Memory management

•      Memory management is automatic
•      Columnstore is persisted on disk
•      Needed columns fetched into memory
           •      Columnstore segments flow between disk and memory
                                              SELECT C2,
                                              SUM(C4)
T.C1              T.C2 T.C3            T.C4   FROM T                  T.C4
                                                               T.C2
           T.C1
                                T.C3
                                              GROUP BY C2;
    T.C1          T.C2 T.C3            T.C4
           T.C1                                         T.C4
                                                 T.C2
                                T.C3
    T.C1                 T.C3          T.C4
                  T.C2


                                                                             16
Look inside Columnstore Indexes
               v                  17
Xvelocity
 Microsoft SQL Server family of memory-optimized and
 in-memory technologies
    xVelocity In-Memory Analytics Engine
    xVelocity Memory-Optimized Columnstore Indexes




 The xVelocity engine is designed with 3 principles in
 mind:
    Performance, Performance, Performance!               18
How Are These Performance Gains
Achieved?
 Two complimentary technologies:
   Storage
      Data is stored in a compressed columnar data format (stored
      by column) instead of row store format (stored by row).
   New “batch mode” execution
      Vector-based query execution capability
      Data can then be processed in batches versus row-by-row
      Depending on filtering and other factors, a query may also
      benefit by “segment elimination” - bypassing million row
      chunks (segments) of data, further reducing I/O               19
Batch mode processing
                             Batch object
                                               Process ~1000 rows at
                              Column vectors
                                               a time
 bitmap of qualifying rows




                                               Vector operators
                                               implemented
                                               Greatly reduced CPU
                                               time (7 to 40X)
Segment Elimination



                                                 select Date, count(*)
                                                 from dbo.Purchase
                                                 where Date >= 20120201
 column_i                                        group by Date
            segment_id min_data_id max_data_id
 d

 1          1          20120101    20120131

 1          2          20120115    20120215

 1          3          20120201    20120228
Columnstore format + batch mode
Variations
   Columnstore indexing alone + traditional row mode in
   Query Processor
   Columnstore indexing + batch mode in Query
   Processor
   Columnstore indexing + hybrid of batch and traditional
   row mode in Query Processor




                                                            23
Plan operators supported in batch mode

 Filter
 Project
 Scan
 Local hash (partial) aggregation
 Hash inner join
 (Batch) hash table build
                                         24
Query processing with
Columnstore Indexes
               v        25
Maintaining Data in a Columnstore Index

 Once built, the table becomes “read-only” and
 INSERT/UPDATE/DELETE/MERGE is no longer
 allowed
 ALTER INDEX REBUILD / REORGANIZE not
 allowed
 How can I modify index data?
   Drop columnstore index / make modifications / add
   columnstore index
   UNION ALL (but be sure to validate performance)
   Partition switches (IN and OUT)                     27
Insert data into table with
Columnstore Index v           28
Summary


 SQL Server 2012 offers significantly faster query performance
 for data warehouse and decision support scenarios
    10x to 100x performance improvement depending on the schema
    and query
        I/O reduction and memory savings through columnstore compressed
        storage
        CPU reduction with batch versus row processing, further I/O reduction if
        segmentation elimination occurs
    Easy to deploy and requires less management than some legacy
    ROLAP or OLAP methods
        No need to create intermediate tables, aggregates, pre-processing and
        cubes
    Interoperability with partitioning
                                                                                   29
Resources


  Columnar Storage in SQL Server 2012 (PDF)
  SQL Server Columnstore Performance Tuning
  Inside the SQL Server 2012 Columnstore Indexes
  24 HOP Russia 2013 – Dmitry Pilyugin (video - rus)
  SQL Server Columnstore Performance Tuning (video)




                                                       30
SQL SERVER 2012 - COLUMNSTORE INDEXES




 Denis Reznik

 Senior Database Architect at The Frayman Group

 Microsoft SQL Server MVP

 denisreznik@live.ru

 @denisreznik

 http://reznik.uneta.com.ua

Sql rally 2013 columnstore indexes

  • 1.
    COLUMNSTORE INDEXES SQL Server2012 Denis Reznik The Frayman Group denisreznik@live.ru
  • 2.
    Columnstore indexes • ColumnStore vs. Row Store • Columnstore benefits • Columnstore indexes • CS indexes Internals • Adding data to Columnstore index
  • 3.
    Row Store andColumn Store In row store, data is stored tuple by tuple. In column store, data is stored column by column 3
  • 4.
    Row Store andColumn Store name address Most of the queries does not id city state age process all the attributes of a particular relation. SELECT c.name, c.address FROM Customers c WHERE c.region = ‘Moskow' 4
  • 5.
    Row Store andColumn Store Row Store Column Store (+) Easy to add/modify a record (+) Only need to read in relevant data (-) Might read in unnecessary data (-) Tuple writes require multiple accesses So column stores are suitable for read-mostly, read-intensive, large data repositories 5
  • 6.
    Compression Trades I/Ofor CPU Higher data value locality in column stores Techniques such as run length encoding far more useful Schemes Null Suppression Dictionary encoding Run Length encoding Bit-Vector encoding Heavyweight schemes 6
  • 7.
    Columnar storage structure C1 C2 C3 C4 C5 C6 Uses VertiPaq compression
  • 8.
    Accelerating Data Warehouse Querieswith SQL Server 2012 v 9 Columnstore Indexes
  • 9.
    Improved Data WarehouseQuery performance Columnstore indexes provide an easy way to significantly improve data warehouse and decision support query performance against very large data sets Performance improvements for “typical” data warehouse queries from 10x to 100x Ideal candidates include queries against star schemas that use filtering, aggregations and grouping against very large fact tables 10
  • 10.
    Good Candidates forColumnstore Indexing Table candidates: Very large fact tables (for example – billions of rows) Larger dimension tables (millions of rows) with compression friendly column data If unsure, it is easy to create a columnstore index and test the impact on your query workload Query candidates (against table with a columnstore index): Scan versus seek (columnstore indexes don’t support seek operations) Aggregated results far smaller than table size Joins to smaller dimension tables Filtering on fact / dimension tables – star schema pattern Sub-set of columns (being selective in columns versus returning ALL columns) 11
  • 11.
    Creating a columnstoreindex T-SQL SSMS 12
  • 12.
    Defining the ColumnstoreIndex Base OR Columnstore index is nonclustered table (secondary) Clustered Heap index Base table can be clustered index or heap One CS index per table Multiple other nonclustered (B-tree) Nonclustered Nonclustered Nonclustered index index columnstore indexes allowed index But may not be needed CS index must be partition-aligned if table is partitioned Indexed Filtered view index
  • 13.
    segment 1 Column Segmentsand Dictionaries C1 C2 C3 C4 C5 C6 Set of about 1M rows … dictionaries segment N Column Segment 15
  • 14.
    Memory management • Memory management is automatic • Columnstore is persisted on disk • Needed columns fetched into memory • Columnstore segments flow between disk and memory SELECT C2, SUM(C4) T.C1 T.C2 T.C3 T.C4 FROM T T.C4 T.C2 T.C1 T.C3 GROUP BY C2; T.C1 T.C2 T.C3 T.C4 T.C1 T.C4 T.C2 T.C3 T.C1 T.C3 T.C4 T.C2 16
  • 15.
  • 16.
    Xvelocity Microsoft SQLServer family of memory-optimized and in-memory technologies xVelocity In-Memory Analytics Engine xVelocity Memory-Optimized Columnstore Indexes The xVelocity engine is designed with 3 principles in mind: Performance, Performance, Performance! 18
  • 17.
    How Are ThesePerformance Gains Achieved? Two complimentary technologies: Storage Data is stored in a compressed columnar data format (stored by column) instead of row store format (stored by row). New “batch mode” execution Vector-based query execution capability Data can then be processed in batches versus row-by-row Depending on filtering and other factors, a query may also benefit by “segment elimination” - bypassing million row chunks (segments) of data, further reducing I/O 19
  • 18.
    Batch mode processing Batch object Process ~1000 rows at Column vectors a time bitmap of qualifying rows Vector operators implemented Greatly reduced CPU time (7 to 40X)
  • 19.
    Segment Elimination select Date, count(*) from dbo.Purchase where Date >= 20120201 column_i group by Date segment_id min_data_id max_data_id d 1 1 20120101 20120131 1 2 20120115 20120215 1 3 20120201 20120228
  • 20.
    Columnstore format +batch mode Variations Columnstore indexing alone + traditional row mode in Query Processor Columnstore indexing + batch mode in Query Processor Columnstore indexing + hybrid of batch and traditional row mode in Query Processor 23
  • 21.
    Plan operators supportedin batch mode Filter Project Scan Local hash (partial) aggregation Hash inner join (Batch) hash table build 24
  • 22.
  • 23.
    Maintaining Data ina Columnstore Index Once built, the table becomes “read-only” and INSERT/UPDATE/DELETE/MERGE is no longer allowed ALTER INDEX REBUILD / REORGANIZE not allowed How can I modify index data? Drop columnstore index / make modifications / add columnstore index UNION ALL (but be sure to validate performance) Partition switches (IN and OUT) 27
  • 24.
    Insert data intotable with Columnstore Index v 28
  • 25.
    Summary SQL Server2012 offers significantly faster query performance for data warehouse and decision support scenarios 10x to 100x performance improvement depending on the schema and query I/O reduction and memory savings through columnstore compressed storage CPU reduction with batch versus row processing, further I/O reduction if segmentation elimination occurs Easy to deploy and requires less management than some legacy ROLAP or OLAP methods No need to create intermediate tables, aggregates, pre-processing and cubes Interoperability with partitioning 29
  • 26.
    Resources ColumnarStorage in SQL Server 2012 (PDF) SQL Server Columnstore Performance Tuning Inside the SQL Server 2012 Columnstore Indexes 24 HOP Russia 2013 – Dmitry Pilyugin (video - rus) SQL Server Columnstore Performance Tuning (video) 30
  • 27.
    SQL SERVER 2012- COLUMNSTORE INDEXES Denis Reznik Senior Database Architect at The Frayman Group Microsoft SQL Server MVP denisreznik@live.ru @denisreznik http://reznik.uneta.com.ua