In memory columnstore indexes--make your data warehouse


Published on

Presentation on SQL Server 2012 and 2014 Columnstore Indexing feature presented to Philadelphia SQL BI Usergroup on November 19, 2013

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.
  • . Generally, nonclustered indexes are created to improve the performance of frequently used queries not covered by the clustered index or to locate rows in a table without a clustered index (called a heap). You can create multiple nonclustered indexes on a table or indexed view.
  • The columnstore index in SQL Server employs Microsoft’s patented Vertipaq™ technology, which itshares with SQL Server Analysis Services and PowerPivot. SQL Server columnstore indexes don’t have tofit in main memory, but they can effectively use as much memory as is available on the server. Portionsof columns are moved in and out of memory on demand
  • What data types cannot be used in a columnstore index?The following data types cannot be used in a columnstore index: decimal or numeric with precision > 18, datetimeoffset with precision > 2, binary, varbinary, image, text, ntext, varchar(max), nvarchar(max), cursor, hierarchyid, timestamp, uniqueidentifier, sqlvariant, xml.The SQL Server 2012 implementation did not support a number of data types such as numeric beyond precision 18, datetimeoffset beyond precision 2, GUID and binary columns. The upcoming version adds support for all the above data types. It also introducessupport for storing short strings by value instead of converting all strings to a 32 bit id within a dictionary. This removes the extraoverhead associated with the dictionary and helps improve the column store compression even further.
  • Include every column of the table in the columnstore index. If you don't, then a query that references a column not included in the index will not benefit from the columnstores index much or at all.Structure your queries as star joins with grouping and aggregation as much as possible. Avoid joining pairs of large tables. Join a single large fact table to one or more smaller dimensions using standard inner joins. Use a dimensional modeling approach for your data as much as possible to allow you to structure your queries this way.Use best practices for statistics management and query design. This is independent of columnstore technology. Use good statistics and avoid query design pitfalls to get the best performance. See the white paper on SQL Server statistics  for guidance. In particular, see the section "Best Practices for Managing Statistics."
  • In memory columnstore indexes--make your data warehouse

    1. 1. In-Memory Columnstore Indexes-Make Your Data Warehouse Fly Joe D’Antoni Philadelphia SQL Server Business Intelligence Group 19 November 2013
    2. 2. About Me Solution Architect, Anexinet @jdanton – Twitter – Blog, Slides
    3. 3. Agenda Indexes—a basic overview Columnstore—an introduction Report Performance—Demo 2012 and 2014—What’s Changing? 2014—Demo Questions
    4. 4. Indexes • Data Structure that allows us to speed data retrieval, by maintaining an extra copy of data • Can be filtered • Can be function based, or ordered • Penalty is that writes become more expensive • More storage required
    5. 5. Indexes in SQL Server • Clustered vs Nonclustered • Non-clustered index ―just an index‖
    6. 6. Clustered Index • Data is ordered as is inserted into pages • Data in clustered index is only stored on disk once (it’s the data from the tables) • Table without a clustered index is called a heap—no order at all
    7. 7. Non-Clustered Index • Duplicate copy of the data in table • Provides point from index to table data • No specific order of data in index
    8. 8. So Why All This Talk About Indexes?
    9. 9. Data Warehouse Queries • Data Warehouses have a lot of data • Querying lots of a data can take a really long time • Processing data row by row— may not be the most efficient way to perform aggregations
    10. 10. Traditional Approaches To Improving Performance • Partitioned Tables • Indexed Views • Data Compression
    11. 11. Introducing Columnstore Indexes (SQL 2012) • Data is stored in columns, as opposed to rows • This allows a much higher rate of compression • Columns not used in a query a simply not scanned, nor returned • Recommended practice is to add most columns in a table to a index
    12. 12. Columnar Data Storage
    13. 13. Columnstore 2012 Demo
    14. 14. So How is So Much Faster? • Very good compression ratio for Column oriented data • Better use of Memory • Segment Elimination Skips Large Chunks of Data • Batch Mode • Processes data in chunks of a 1000 row ―batches‖ rather than row by row • 7-40x CPU savings with batch mode “The key to getting the best performance is to make sure your queries process the large majority of data in
    15. 15. Columnstore All The Things? • Awesome performance—so what’s the negative? • Can’t update/insert in 2012 • Can only be nonclustered index— so we are storing more data on disk • Data types are somewhat limited • One index per table • Can’t be a sorted index
    16. 16. So Where To Use Columnstore Indexes? • Only on Large Tables—Fact tables and Dimension Tables > 3 Million Rows • Include Every Column • Structure Queries as star joins with grouping and aggregation More details here
    17. 17. Columnstore 2014
    18. 18. Columnstore in 2014 • Fewer Data Type Limitations • Updateable • Can be Clustered Index • New Archival Compression Mode • Batch Mode Improvements
    19. 19. Columnstore Updates (2014) Updates To Index Collected until they reach 1000 rows Tuple Movers Move into Index
    20. 20. Columnstore Updates (2014) • Bulk Inserts go through special API • Updates are processed as inserts and deletes, so expensive opertation
    21. 21. Columnstore 2014 Demo
    22. 22. What Do We Do Differently in 2014 • Best Practices are mostly the same • Batch mode gets enhanced and gains more query types • No need to worry about dropping and rebuilding indexes—just append data • Still focus on large tables where data is not frequently updated • Archival Compression Good for old unused data
    23. 23. Questions
    24. 24. Contact @jdanton
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.