  1. 1. VertiPaq-ing With SQL2012 Columnstore IndexesxVelocity Abstract Significant work on the SQL 2012. One of such column store has changed most expectant the storage paradigm for implementation is xVelocity data warehousing. The Column Store Indexes. This column store supported by paper contains the detailed vector based query discussion on column store execution and substantial touching on the basics, progress in data limitations & design compression have edged considerations with the the technology as potential demonstration examples. game changers. Microsoft Columnar and Column store targeting high on next-gen will be used technologies increasing use interchangeably of in-memory but memory in the below discussion. optimized techniques in
  2. 2. AudiencesThis paper targets IT Planners, and Memory are highlyArchitects, BI Users, CTO‟s and dependent on the method ofCIO‟s evaluating the SQL 2012 storage and querying. Queriesanswering their large data can be seen as (1) read-onlygrounded business needs. It also workloads which are mostlytargets the enthusiasts of 2012 to reporting and DW systems andprovide new dimensions and out (2) the read-write workloadsof box thinking to the mostly OLTP systems. Theorganization to maintain data potential game changer in read-using SQL 2012. only workloads is the storage method to minimize I/O andOverview Memory based operationsData is growing exponentially where-as conventional RDBMSand performance is becoming a stores data in row based storagerecurring cost for the system. Based on the columnarorganizations. Performance design a gain or speedup inimpact can be broadly factorized queries can be seen from 10X upas (1) I/O based operations (2) to 1000X.Memory based operations & (3)Operations to transfer data in For instance suppose theN/W or other peripherals. I/O employee information looks like … ID Name Street Address 32498 Diamond John Crouse Manson 45298 Mary Anglos Wilson StreetAcronyms RDBMS Relational Data Base Management System HCC Hybrid Columnar Compression I/O Input Output FCC Full Columnar Compression N/W Network LOB Large Objects OLTP Online Transaction Processing OLAP Online Analytical Processing DW Data Warehouse CSI Column Store Indexes ETL Extract Transform & Load
  3. 3. Then in the conventional RDBMS (e.g. SQL Server Where-as the Columnar Storage (e.g. SQL Server 20122008) it will be stored in row by row-wise fashion as using column store index) will store it columnar fashionshown in the Diagram-1 below. as shown in Diagram-2 below.Diagram-1 Diagram-2VertiPaq-ing withxVilocity Columnnar IndexesWhy Columnar IndexesThere is a great debate for thecolumnar structure. Below arebenefits of using columnar indexesspecifically to SQL Server 2012Astonishing ResultsThought is to start the result driven comparison of query time with shows that warm cache is takingdiscussion. Below are the graphs “Column Store Index” Vs. very less time comparatively. Thefor the query performance results. “Conventional Indexes” are graph-3 below is gain in “X”We have started with the 12.5 exceptionally revealing in favor of number of times in warm and coldmillion rows and doubled it every column store indexes. The graph 1 cache. The result really excites thetime till 400 million records to get and graph 2 shows the big gap of use of sales across products. The hundreds of seconds. The result
  4. 4. Query Execution - Cold Cache Query Exec Time (in sec.) 1400 1200 1000 Column Store 800 Indexes 600 Conventional Indexes 400 200 0 12.5 25 50 100 200 400 Number of Rows (in millions) Graph-3 Query Execution - Warm Cache 1400 Query Exec Time (in sec.) 1200 1000 Column Store 800 Indexes 600 Conventional Indexes 400 200 0 12.5 25 50 100 200 400 Number of Rows (in millions) Graph-4 Gain in query performance - Warm Vs. Cold Cache 90 Perf Gain (number of times) 75 60 Warm Cache 45 Cold Cache 30 15 0 12.5 25 50 100 200 400 Number of Rows (in millions)
  Faster Query Performance
Cost Saving
  VartiPaq-ing & Appolo
VertiPaq-ing is vertical partition of the data or in other words storing in the data in column-wise fashion. The Diagram-3 shows the difference between the row and column store data layout in terms of pages which is basic unit of storage. For detailed discussion refer to the Basics Behind the Scenes section below. The goal behind it is to accelerate the common DW queries.
Appolo is the code name available in SQL 2012 with the new feature available targeting
  7. 7. xVelocity optimized to use multi cores and the machine. Concisely they arexVelocity is term used by SQL high memory. Some more highly optimized in-memoryServer family to define next- utilization of these techniques are operations. Below is the screengeneration technologies. These there in Analysis Services and shot taken during column storetechnologies targets surprisingly PowerPivot. Portions of data are index creation of CPU utilizationhigh query performance in moved in and out of memory by xVelocity technologies.modern hardware. They are based on the memory available in Graph-7Basics Behind the ScenesFull Column Store & Hybrid only indexes). Refer to row. The rows spans overColumn Store diagram-2 and 3 for details. multiple data blocks. TheSQL 2012 is full columnar On the other hand hybrid diagram-5 shows the detail ofstorage where each column is column used both rows and the concept. This way the largecompressed and stored columns to store data. Hybrid amount of compression istogether. This technique has its technique creates column achieved as well as theown advantages but it may vector for each performance issues of the fullnegatively impact the column, compresses and stores columnar databases is alsoperformance on accessing in data blocks. The mitigated.more columns or perform compression unit containssmall number of updates more than one data blocks and(although SQL 2012 has read- it contains all columns for a Graph-8
  8. 8. For the warehousing scenario the HCC approach many times is less performing because of • • •Segments & DictionariesThe columnar indexes are of the same data types in a about segments.physically stored in the form of segment. Even the large repeated A value reference links to an entrySegments. Typically data per data the compression is even in one of up to two hashcolumn is broken as one million better as a unique small symbol Dictionaries. Dictionaries are keptrows per segment (a.k.a. row is stored for the duplicate value in memory and the data value idsgroups) for each column. The which saves the size with large from the segment are referredsegments are stored as LOB and degrees. Segments also have from these dictionaries but thiscan contain multiple pages. The header records containing process is held over as long asindex build process runs in parallel max_data_id, min_data_id etc. possible for better performanceand creates as many full segments These header information is reasons. Simply for a table withas possible but some of the used to omit he compete one partition every column addedsegments can have comparatively partition commonly known as to the column store index will besmall size. These segments store segment estimation. The anti- added as a row in the segment.highly compressed values because patterns part details even moreBatch Mode Processing & Row ProcessingQuery processing can be done disk (mostly in hash join); which each column within batch is storedeither in Row mode or in Batch can be checked by tempdb uses; as vector in memory which isMode. While taking an example of also increases the memory uses known as vector-based queryjoin physical join operation takes 2 for processing. processing. It uses the latestsets as input parameters and Vector processing was one of the algorithms to utilize the multicoreproduces the output set based on biggest revolutions which brought CPUs and the latest hardware.the join conditions. In the row the fundamentals of batch Batch processing works on theprocessing each these sets are processing. These physical compressed data when possibleprocessed in row-by-row mode operators for query processing and thus reduces the CPUe.g. nested loop join etc. and large takes batch of rows in form of an overhead on join operations; filteramount of CPU is used. Most of array (of same type) and process etc. (only some of the operators)the times while operating on large the data. Here batch typicallyamount of data also spill over the consists of 1000 rows of data and
  9. 9. Demonstration ExampleFor the demonstration purpose Contoso Retail DW is being used, made available from Microsoft.Creation of Columnar Index Ultimately they both are same. The Store Indexes Vs. ConventionalColumn Store Indexes Vs. basic T-Sql index is as below and Indexesfrom here.Conventional Indexesquery. details can be captured Column Creation of Columnar Index – Code Block 1 CREATE [ NONCLUSTERED ] COLUMNSTORE INDEX index_name ON <tablename> ( column [ ,...n ] ) [ WITH ( <column_index_option> [ ,...n ] ) ] [ ON { { partition_scheme_name ( column_name ) } | filegroup_name | "default" } ]Below steps can be followed to create the column store index from index creation wizard. • •
  10. 10. Graph-8 • •Graph-9•
  11. 11. Performance ObservationsThe performance check ContosoRetailDW having detailed with example inwas done one the table 12.6 million records. More the Anti-Patterns sectiondbo.FactOnlineSales‟ from facts and limitations are below.•••••Graph-10
  12. 12. Design ConsiderationsCandidates for Column Store IndexDW scenarios most commonly fall designed to accelerate the queries do not take considerable largein the pattern of having read-only satisfying above said criteria. This space. Although the algorithm isdata where data is appended makes CSI an absolutely perfect fit designed to compress in largeperiodically commonly using for the DW scenarios. So the rule scale still for the best practice wesliding window pattern. They of thumb says large fact tables are should only include the all theseldom have updates. Data is the candidates for CSI. Security of dimension keys and measuresretained for longer time of at least the data is not a big concern from the table.8 to 10 years resulting huge because CSI also supports The fact-less fact tables andvolume of data in Transparent Data Encryption (TDE). multivalued dimensions are notgigabytes, terabytes or even Another question is what all always perfect fit because they willpetabytes for some scenarios. The columns need to be added to CSI? not gain the benefit of batchDW data mostly is divided either The answer seems considerably processing but the advantage ofin star or snowflake pattern where easy that all the columns can be compression and parallel read andfact table contains millions and included as long as they follow the segment estimation will definitelybillions or records ready to be prerequisites quoted in the Anti- be there. Below is the example ofaggregate in different fashion. All Patterns section. This decision can choosing candidate tables for CSI.these schemas are queried be true when we talk about the This selection mostly is basedtypically using star join queries for small or medium scale DW upon number of rows and mostlygrouping aggregations. because audit columns or some of they will be fact tables only.Column store indexes are the text columns in the fact tables Candidates for Column Store Index - Code Block 1 Diagram-9 --Choose candidate tables for CSI SELECT TableName ,SUM(P.rows) CountOfRows FROM sys.partitions P JOIN sys.objects O ON P.object_id = O.object_id WHERE O.type = U --user tables GROUP BY ORDER BY 2 DESC Graph-11Below is the example for choosing while creating CSI. We‟ll ignore the primary key defined in it so willcandidate columns for the fact audit and degenerated dimension automatically be added to the CSItable using FactOnlineSales. The columns here e.g. if not mentioned in column list.mark is used to show the selection SalesOrderNumber Corresponding SQL code refers tofor the columns for the ,SalesOrderLineNumber „Candidates for Column Storedimensions. Along with it all the ,ETLLoadID ,LoadDate Index – Code Block 2‟ in SQL file.measures will also be included ,UpdateDate. OnlineSalesKey have
  13. 13. Graph-12SQL code „Candidates for are accelerated within a primaryColumn Store Index – Code second. Both the star and or and secondary snow flakeBlock 3‟ in corresponding SQL snowflake schema query are dimension is too large tofile contains the example of the benefited by CSI. Snowflake support batch join query where results may have issues if any of theAnti-PatternsDesign considerations are limitations always provide first soprano of designalways with-in the defined foundations to decide on decision.boundaries. Anti-patterns / boundaries making them the• Only one CSI can be created on a table. It returns the below error. Msg 35339, Level 16, State 1, Line 1 Multiple nonclustered columnstore indexes are not supported.• Msg 35339, Level 16, State 1, Line 1• Multiple nonclustered columnstore indexes are not supported.• Key column concept does not relevant in CSI because data is stored in columnar fashion hence each column will be stored in its own way. Having a clustered key will make difference only while creating the column store index in terms of reads and the order but there is no impact in query performance.• The base table on which index is created is read-only i.e. can‟t be updated or altered. Managing updates is quoted below.• Interestingly the order of the columns in the create index statement do not have impact either in creating index or in query performance.• Only limited data types are allowed for CSI i.e. I nt, bigint, smallint, tinyint, money, smallmoney, bit, float, real, char(n), varchar(n), nchar(n), nvarchar(n), date, datetime, datetime2, smalldatetime, time, datetimeoffset with precision <=2, decimal/numeric with precision <= 18• CSI can have at most 1024 columns and don‟t support
  14. 14. - Sparse & Computed Columns - Indexed Views or Views - Filtered & Clustered Index - With INCLUDE, ASC, DESC, FORCESEEK keywords - Page and row compression, and vardecimal storage format - Replication, Change tracking, Change data capture & Filestream• CSI can simply be ignored using „IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX‟. This option helps user not to know the other index names. Even more helpful if other index names are left to the automatic naming by SQL server when it is difficult to know the name while writing queries. Anti-Patterns - Code Block 1 SELECT P.BrandName Product ,SUM(SalesAmount) Sales FROM dbo.FactOnlineSales S JOIN dbo.DimProduct P ON S.ProductKey = P.ProductKey GROUP BY P.BrandName OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)• Only below operators support batch mode processing therefore use of CSI. - Filter - Project - Scan - Local hash (partial) aggregation - Hash inner join - (Batch) hash table build• Outer joins, MAXDOP 1, NOT IN, UNION ALL are not supported for batch mode execution. Rather we can have a tweak for the existing query for the same. Some examples are here below.• Although filters in CSI are pushed down to the segments to get benefit of segment estimation but string filters do not have max or min values hence they do not utilize these filters. So string filters or joins should be avoided on CSI.• It is observed during above investigation that after the partition switching the query compilation for the first time takes lot of time. Similar behavior was not found while insertion or deletion of data from the table. It may be because of the estimation changes due to partition switching mainly in large data scenario. It is recommended to warm the cache after the partition switching.Managing Column Store IndexesMemory ConsiderationsColumn Store Indexes (CSI) is the available CPUs and the MaxDOP or 8658 comes. It can be resolvedtechnology which is created setting restrictions. For large data by granting enough memory toconsidering modern hardware creation of CSI takes the server and the correspondingwith multiple CPUs and high comparatively more time than the workgroup. There can be requestsmemory operating on large B-Tree indexes. Before the for more memory at later point ofamount of data specially terabytes. creation there is memory estimate execution and if enough memoryFor CSI memory is used during the done initially for the query is not there the insufficientcreation and execution time. We‟ll execution and that memory grant memory error can flash i.e. errordiscuss them separately. is provided. There may be cases code 701 or 802.Creating CSI is a parallel where initial memory request isoperation. It is dependent on the not granted and error code 8657
  15. 15. Latter error codes come during resolution for other 2 errors is at optimizer changes the query planthe execution at run time whereas 701 and 802. On concluding to use the row mode. We canthe former come at the starting of remarks the CSI can‟t be created if check the batch mode processingthe query execution. Solution to enough memory is not there in the uses by query plan as below. Batchthe problem is to change memory system. One of the easiest mode processing always uses thegrant for the workgroup or solutions for such memory memory so whenever there diskincrease memory in the server. The considerations is vertical partition spill due to large data the row8657 or 8658 sometimes can occur of the existing table i.e. breaking by row processing replaces thebecause of the SQL server the existing table to two or more batches; mostly seen duringconfiguration of „min server tables. hash join for large tables. Anothermemory‟ and „max server CSI uses the batch mode reason for row by row processingmemory‟. Suppose the minimum processing for the execution. is incorrect statistics update whichmemory needed for the CSI is 3GB Typically a batch consists of 1000 in turn spills the data into theand SQL Server have not taken on rows stored in the vector. This disks resulting row by row1GB memory due to min server type of processing is optimized to operation. To check thismemory configuration then it can use the modern hardware. This operation the extended eventhappen. The resolution can be provides better parallelism. Batch batch_hash_table_build_bailout‟either run a COUNT(*) query on operators can work on can be configured. The warningany of the large tables before the compressed data resulting in high „Operator used tempdb to spillindex creation or make the min degree of processing in small data during execution‟ also flashesand max server memory values to memory. A considerable amount for this kind of behavior.same number. This will help SQL of memory is needed to executeserver to take the required the batch mode query processing.memory at the starting time. The If the memory is not present the Graph-13Add & Modify Data in ColumnStore IndexTable with CSI is read only i.e. wecan‟t perform operations likeINSERT, UPDATE, DELETE orMERGE. These operations fail withthe error message e.g. Msg 35330, Level 15, State 1, Line 1 UPDATE statement failed because data cannot be updated in a table with a columnstore index. Consider disabling the columnstore index before issuing the UPDATE statement, then rebuilding the columnstore index after UPDATE is complete.
  16. 16. Considering this we have the below options or workarounds for the operation.• Have staging/work tables without CSI (most of the cases these are drop and recreate tables). Create CSI and switch it to the empty partition of the table. We have to make sure that we have the empty partition because if there is data in the partition and CSI is created into the table we can‟t split it. Below is the example code segment for the same. Corresponding SQL code refers to „Add & Modify Data – Code Block 2‟ in SQL file. Add & Modify Data - Code Block 1 ALTER INDEX csiFactOnlineSales ON dbo.FactOnlineSales DISABLE GO UPDATE dbo.FactOnlineSales SET SalesAmount = SalesAmount * 2 GO ALTER INDEX csiFactOnlineSales ON dbo.FactOnlineSales REBUILD• Switch a partition from table to the empty staging table. Drop CSI from staging table and perform updates, inserts etc. and build the CSI and switch the staging table to the empty (empty by previous switch) partition. Corresponding SQL code refers to „Add & Modify Data – Code Block 3‟ in SQL file.• We can choose to create different underline tables to represent a fact table and access all of them using UNION ALL views. Just disable the index in the most recent table which will have the updates and rebuild/recreate the CSI. We can always get the data from those UNION ALL views.• Put the data into the staging table, create the CSI in staging table and just drop the existing table and rename the staging table to original (better to do both of the operations in a transaction, note that both of them will be metadata operations only). This will have the more processing time but will ensure the high availability. This option can be chosen only when there are relatively small or medium scales of data in the table.Size of Column Store Index Statistics are another valuable statistics object for CSI is used forSize of the CSI is based on the size consideration. We have statistics the database cloning (DB clone isof the segment and dictionaries. for the base table having CSI but copy of the statistics-onlyMost of the space is used by the not for the CSI in particular. The database investigating query plansegments. We can get the same in statistics object is created for the issues). Corresponding SQL codemore simplified manner. Here are CSI but SHOW_STATISTICS shows refers to „Size of Column Storethe simple and the actual size null for the CSI and show values Index – Code Block 1‟ in SQL file.estimation query. for the clustered index. The
  17. 17. Column Store Indexes Vs. Conventional IndexesColumn Store Index vs. Clustered IndexesCSI is different than all other highly selective query i.e. only few automatically figures out theconventional indexes. Both of records are being queried using highly utilized query. Moreoverthem are the utilities for different both of the indexes. Please take a plan guides can also be pinned fortype of scenarios. Till now we have note that we are using only those abnormal behavior of the queries.seen that CSI are a lot faster than columns which are being used in Here for the apple to applethe conventional indexes. Here the CSI creation. The comparisons comparison we are using only thebelow is the example where CSI is among the indexes is always columns used to create CSI.taking almost 99% in the relative based on the nature of uses onquery plan. Here we are using the the data i.e. queries. SQL Server Column Store Index vs. Clustered Indexes – Code Block 1SELECT SalesAmount ,ProductKey FROM dbo.FactOnlineSales S WITH (INDEX(PK_FactOnlineSales_SalesKey))WHERE OnlineSalesKey IN (32188091,23560484,31560484,27560484)SELECT SalesAmount ,ProductKey FROM dbo.FactOnlineSales S WITH (INDEX(csiFactOnlineSales))WHERE OnlineSalesKey IN (32188091,23560484,31560484,27560484)Graph-14Column store index Vs. them to the B-Tree using INCLUDE Selecting one more column canCovering Indexes Vs. keyword. A very detailed make the covering indexOne index each column description can be referred ineffective which is not the caseCovering index is the highly used from here. with normal index. Creating eachterminology to achieve the high CSI or the covering index, again index each column will not beperforming queries. Creation of the discussion depends on the useful on selecting multiplecovering index is always a cautious amount of data, the query and the columns. Moreover the size of alldecision. It is very difficult to put memory. On the same nodes CSI the covering or other indexesindexes which cover all the uses compression as well as the captures relatively larger footprintqueries, particularly in the data batch mode processing hence on the disk, which is multiplewarehousing scenarios where faster scans. If we have the entire copies of the same data resultingusers are open to use any kind of star schema for our DW the CSI is more maintenance and sometimesqueries. Covering index can be best to use for aggregative adding to downtime to theachieved either by adding the queries. It also reduces the index application.columns into the index i.e. design and maintenance time andcomposite index or by pinning one index shows all of the magic.
  18. 18. On the other hand here is another row execution. Here we‟ll just query plan. Corresponding SQLexample which shows that CSI is create example table joining with code refers to „Column store indexnot benefiting more on query FactOnlineSales. Both of the table Vs. Covering Indexes – Code Blockexecution time because of the will have the same cardinality. We 1‟ in SQL file.large hash joins and batch can easily see a warning messageexecution turning back to row by and warning icon in the actualGraph-15Graph-16Performance Tuning ConsiderationsAnalyzing TempDB UsesTempDB is core of all the temp tempdb as well. Point of analysis is row operation instead of batchoperations for which memory is tempdb uses during creating and ensuring data is spilled into thenot granted. SQL server uses the query on CSI. For the surprise the disk i.e. tempdb is used and abovetempdb extensively and if users tempdb was not used during the is the example showing thishave read only permissions on any creation as well as querying time. behavior. Corresponding SQL codeof the databases that ensures the The tempdb will be used when the refers to „Analyzing TempDB Usesread-write permissions on the execution is done using row by – Code Block 1‟ in SQL file.
  19. 19. Maximizing Segment EstimationThe data of CSI is divided in segments and this information is stored in the „column_store_segments‟ system table.The columns for the relevance to understand segment estimation are in below query. Maximizing Segment Estimation - Code Block 1SELECT S.column_id ,S.segment_id ,S.min_data_id ,S.max_data_idFROM sys.column_store_segments SHere segment stores the min and to the segment the scan for that writing a query for the belowmax value for the segment and if segment is ignored i.e. called which says „OnlineSalesKey >the filter value does not belongs segment estimation. E.g. if we are 30000000‟ the second segment will be ignored. Graph-16Here in the example we are seeing one segment is eliminated. Here and the values are alignedthat the min and max values are we need to find how to arrange properly to the segments. We canskewed. This is not ideal for the the data so that we have use the below techniques.segment estimation because only maximum number of partitions•• Maximizing Segment Estimation - Code Block 2SELECT G.ContinentName ,SUM(S.SalesAmount) TotalSalesFROM dbo.FactOnlineSales SJOIN dbo.DimCustomer C ON C.CustomerKey = S.CustomerKeyJOIN dbo.DimGeography G ON G.GeographyKey = C.GeographyKeyWHERE S.DateKey BETWEEN 2012-01-01 AND 2012-12-30GROUP BY G.ContinentName
  20. 20. Graph-17 Graph-18On running the above query again also should have enough data for that none of the queries are usingwe can found that the segment each partition so that the MAXDOP 1 option. Below exampleestimation will scan will skip the segments are utilized. If we‟ll have shows the difference in thecrossed partitions and thus the less than 1 million records we may execution plan. The below querysegment estimation is maximized. end up doing crash landing and plan shows that there is no use ofIt is nice to use this approach but queries may not help as expected. parallel and batch is very hard to manage these Moreover the cost for the CSI scankinds of partitions and it may end Ensuring Batch Mode Execution is also more for MAXDOP 1.up coming out to be another tool. Batch mode vector basedMoreover adding other multiple execution helps the query a lot.dimensions will add similar MAXDOP configuration helps tocomplexity to the partitions. We check this behavior by ensuringGraph-19Batch mode processing is not join records. The query plan shows have to mark time and have closesupported for outer joins for this all different results where batch eye for each query being writtenrelease of SQL Server. To get the mode and row mode is used along on CSI. Query plans should bebenefit of the batch processing we with the parallelism. It also shows monitored closely for furtherneed to change the queries a bit. that the alternate query just takes changes not only in developmentOne of the typical example of the 12% of the relative cost. but also in productionchanging the query is as below These examples shows that we environments. Corresponding SQLwhere we first are getting inner need to redesign our conventional code refers to „Ensuring Batchjoin values and joining them back queries to take advantage of the Mode Execution – Code Block 1‟ into the dimension table for outer batch mode. The bottom line is we SQL file.
  21. 21. Graph-20
