Sql rally 2013 columnstore indexes

1,244 views
1,073 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,244
On SlideShare
0
From Embeds
0
Number of Embeds
548
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Sql rally 2013 columnstore indexes

  1. 1. COLUMNSTORE INDEXESSQL Server 2012 Denis Reznik The Frayman Group denisreznik@live.ru
  2. 2. Columnstore indexes• Column Store vs. Row Store• Columnstore benefits• Columnstore indexes• CS indexes Internals• Adding data to Columnstore index
  3. 3. Row Store and Column Store In row store, data is stored tuple by tuple. In column store, data is stored column by column 3
  4. 4. Row Store and Column Store name address Most of the queries does not id city state age process all the attributes of a particular relation. SELECT c.name, c.address FROM Customers c WHERE c.region = ‘Moskow 4
  5. 5. Row Store and Column StoreRow Store Column Store(+) Easy to add/modify a record (+) Only need to read in relevant data(-) Might read in unnecessary data (-) Tuple writes require multiple accesses So column stores are suitable for read-mostly, read-intensive, large data repositories 5
  6. 6. Compression Trades I/O for CPU Higher data value locality in column stores Techniques such as run length encoding far more useful Schemes Null Suppression Dictionary encoding Run Length encoding Bit-Vector encoding Heavyweight schemes 6
  7. 7. Columnar storage structure C1 C2 C3 C4 C5 C6Uses VertiPaqcompression
  8. 8. Accelerating Data WarehouseQueries with SQL Server 2012 v 9Columnstore Indexes
  9. 9. Improved Data Warehouse Query performance Columnstore indexes provide an easy way to significantly improve data warehouse and decision support query performance against very large data sets Performance improvements for “typical” data warehouse queries from 10x to 100x Ideal candidates include queries against star schemas that use filtering, aggregations and grouping against very large fact tables 10
  10. 10. Good Candidates for ColumnstoreIndexing Table candidates: Very large fact tables (for example – billions of rows) Larger dimension tables (millions of rows) with compression friendly column data If unsure, it is easy to create a columnstore index and test the impact on your query workload Query candidates (against table with a columnstore index): Scan versus seek (columnstore indexes don’t support seek operations) Aggregated results far smaller than table size Joins to smaller dimension tables Filtering on fact / dimension tables – star schema pattern Sub-set of columns (being selective in columns versus returning ALL columns) 11
  11. 11. Creating a columnstore indexT-SQLSSMS 12
  12. 12. Defining the Columnstore IndexBase OR Columnstore index is nonclusteredtable (secondary) Clustered Heap index Base table can be clustered index or heap One CS index per table Multiple other nonclustered (B-tree)Nonclustered Nonclustered Nonclustered index index columnstore indexes allowed index But may not be needed CS index must be partition-aligned if table is partitioned Indexed Filtered view index
  13. 13. segment 1Column Segments andDictionariesC1 C2 C3 C4 C5 C6 Set of about 1M rows … dictionaries segment N Column Segment 15
  14. 14. Memory management• Memory management is automatic• Columnstore is persisted on disk• Needed columns fetched into memory • Columnstore segments flow between disk and memory SELECT C2, SUM(C4)T.C1 T.C2 T.C3 T.C4 FROM T T.C4 T.C2 T.C1 T.C3 GROUP BY C2; T.C1 T.C2 T.C3 T.C4 T.C1 T.C4 T.C2 T.C3 T.C1 T.C3 T.C4 T.C2 16
  15. 15. Look inside Columnstore Indexes v 17
  16. 16. Xvelocity Microsoft SQL Server family of memory-optimized and in-memory technologies xVelocity In-Memory Analytics Engine xVelocity Memory-Optimized Columnstore Indexes The xVelocity engine is designed with 3 principles in mind: Performance, Performance, Performance! 18
  17. 17. How Are These Performance GainsAchieved? Two complimentary technologies: Storage Data is stored in a compressed columnar data format (stored by column) instead of row store format (stored by row). New “batch mode” execution Vector-based query execution capability Data can then be processed in batches versus row-by-row Depending on filtering and other factors, a query may also benefit by “segment elimination” - bypassing million row chunks (segments) of data, further reducing I/O 19
  18. 18. Batch mode processing Batch object Process ~1000 rows at Column vectors a time bitmap of qualifying rows Vector operators implemented Greatly reduced CPU time (7 to 40X)
  19. 19. Segment Elimination select Date, count(*) from dbo.Purchase where Date >= 20120201 column_i group by Date segment_id min_data_id max_data_id d 1 1 20120101 20120131 1 2 20120115 20120215 1 3 20120201 20120228
  20. 20. Columnstore format + batch modeVariations Columnstore indexing alone + traditional row mode in Query Processor Columnstore indexing + batch mode in Query Processor Columnstore indexing + hybrid of batch and traditional row mode in Query Processor 23
  21. 21. Plan operators supported in batch mode Filter Project Scan Local hash (partial) aggregation Hash inner join (Batch) hash table build 24
  22. 22. Query processing withColumnstore Indexes v 25
  23. 23. Maintaining Data in a Columnstore Index Once built, the table becomes “read-only” and INSERT/UPDATE/DELETE/MERGE is no longer allowed ALTER INDEX REBUILD / REORGANIZE not allowed How can I modify index data? Drop columnstore index / make modifications / add columnstore index UNION ALL (but be sure to validate performance) Partition switches (IN and OUT) 27
  24. 24. Insert data into table withColumnstore Index v 28
  25. 25. Summary SQL Server 2012 offers significantly faster query performance for data warehouse and decision support scenarios 10x to 100x performance improvement depending on the schema and query I/O reduction and memory savings through columnstore compressed storage CPU reduction with batch versus row processing, further I/O reduction if segmentation elimination occurs Easy to deploy and requires less management than some legacy ROLAP or OLAP methods No need to create intermediate tables, aggregates, pre-processing and cubes Interoperability with partitioning 29
  26. 26. Resources Columnar Storage in SQL Server 2012 (PDF) SQL Server Columnstore Performance Tuning Inside the SQL Server 2012 Columnstore Indexes 24 HOP Russia 2013 – Dmitry Pilyugin (video - rus) SQL Server Columnstore Performance Tuning (video) 30
  27. 27. SQL SERVER 2012 - COLUMNSTORE INDEXES Denis Reznik Senior Database Architect at The Frayman Group Microsoft SQL Server MVP denisreznik@live.ru @denisreznik http://reznik.uneta.com.ua

×