Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
 An independent SQL Consultant
 A user of SQL Server from version 2000 onwards with 12+ years
experience.
CPU Cache, Memory and IO Subsystem Latency

Core

L1

L2

Core

L1

L2

L3
Core

L1

L2

Core

L1

L2

1ns

10ns

100ns

1...
C

The “Cache out” Curve
Throughput

Every time we drop out of a cache
and use the next slower one down,
we pay a big thro...
CPCaches

Sequential Versus Random Page CPU Cache Throughput

Million Pages/sec

C

1,000
900
800
700

Random Pages

600

...
Moores Law Vs. Advancements In Disk Technology
 “Transistors per square inch on integrated circuits
has doubled every two...
Control flow
Row by row

Row by row

Row by row

Row by row

How do rows
travel between
Iterators ?
Data Flow






Query execution which leverages
CPU caches.
Break through levels of compression
to bridge the performance gap
bet...
 First introduced in SQL Server 2012, greatly enhanced in 2014
 A batch is roughly 1000 rows in size and it is designed ...
xperf –on base –stackwalk profile

SELECT p.EnglishProductName
,SUM([OrderQuantity])
,SUM([UnitPrice])
,SUM([ExtendedAmoun...
Conceptual View . . .
Break blobs into
batches and
pipeline them
into CPU cache

Load
segments
into
blob
cache

Blob
cache...
SELECT p.EnglishProductName
,SUM([OrderQuantity])
,SUM([UnitPrice])
,SUM([ExtendedAmount])
,SUM([UnitPriceDiscountPct])
,S...
Row mode
Hash Match
Aggregate

445,585 ms*

Vs.
Batch mode
Hash Match
Aggregate

78,400 ms*
* Timings are a
statistical
es...





Compressing data going down
the column is far superior to
compressing data going across
the row, also we only retr...
SQL Server 2014 Column Store Storage Internals
Row
Groups

A

B

C

< 1,048,576
rows

Encode &
Compress

Store
Delta store...
Global dictionary
Deletion Bitmap

Local Dictionary

Inserts of 1,048,576 rows and over
Inserts less than 1,048,576 rows
a...
SELECT

[ProductKey]
,[OrderDateKey]
,[DueDateKey]
,[ShipDateKey]
,[CustomerKey]
,[PromotionKey]
,[CurrencyKey]
.
.
INTO
F...
Query

SELECT a.number
INTO
OrderedSequence
FROM
master..spt_values AS a
CROSS JOIN master..spt_values AS b
CROSS JOIN mas...
SQL Server 2012

SQL Server
2014

Column store indexes

Yes

Yes

Clustered column store indexes

No

Yes

Updateable colu...
Disclaimer: your own mileage may vary depending on your data, hardware
and queries
Hardware
2 x 2.0 Ghz 6 core Xeon CPUs
Hyper threading enabled
22 GB memory
Raid 0: 6 x 250 GB SATA III HD 10K RPM
Rai...
Compression Type / Time (ms)
300000
Time (ms)

SELECT SUM([OrderQuantity])
,SUM([UnitPrice])
,SUM([ExtendedAmount])
,SUM([...
No
compression

545,761 ms*

Vs.
Page
compression
1,340,097 ms*
All stack trace
timings are a
statistical
estimate
Elapsed Time(ms) / Column Store Compression Type
4500
Elapsed Time(ms)/Compression Type

4000

3500
3000
2500

52 Mb/s
99%...
Clustered
column store
index

60,651 ms

Vs.
Clustered
column store
index with
archive
compression

61,196 ms
We will look at the best we can
do without column store indexes:
Partitioned heap fact table with page
compression for s...
Join Scalability DOP / Time (ms)

Time (ms)
800000

HDD page compressed partitioned fact
table

700000

Flash partitioned ...
Join Scalability DOP / Time (ms)

Time (ms)
60000

hdd column store
hdd column store archive

50000

flash column store
fl...
A SQL Server workload should scale up to
the limits of hardware, such that:
All CPU capacity is exhausted
or
All storag...
40000

120
Elapsed Time (ms)
Pct CPU Utilisation

35000

100

30000
80

25000

20000

60

15000

40

10000
20

5000
0

0
1...
8000

100
Waiting Latch Request Count

7000

90

Pct CPU Utilisation

80
6000
70

5000

60

4000

50
40

3000

30
2000
20
...
10000
9000

100

Spinlock Spin Count (1000s)
Pct CPU Utilisation

90

8000

80

7000

70

6000

60

5000

50

4000

40

30...
What most
people tend to
have
CPU

CPU used for IO consumption + CPU used for decompression < total CPU capacity

Compress...
CPU

CPU used for IO consumption + CPU used for decompression > total CPU capacity

Compression works against you 
CPU us...
No significant difference in terms of performance between column store
compression and column store archive compression.
...
 Enhancements To Column Store Indexes
(SQL Server 2014 ) Microsoft Research
 SQL Server Clustered Columnstore Tuple Move...
Thomas Kejser
Former SQL CAT member
and CTO of Livedrive
ChrisAdkin8

chris1adkin@yahoo.co.uk

http://uk.linkedin.com/in/wollatondba
Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)
Upcoming SlideShare
Loading in …5
×

of

Column store indexes and batch processing mode (nx power lite) Slide 1 Column store indexes and batch processing mode (nx power lite) Slide 2 Column store indexes and batch processing mode (nx power lite) Slide 3 Column store indexes and batch processing mode (nx power lite) Slide 4 Column store indexes and batch processing mode (nx power lite) Slide 5 Column store indexes and batch processing mode (nx power lite) Slide 6 Column store indexes and batch processing mode (nx power lite) Slide 7 Column store indexes and batch processing mode (nx power lite) Slide 8 Column store indexes and batch processing mode (nx power lite) Slide 9 Column store indexes and batch processing mode (nx power lite) Slide 10 Column store indexes and batch processing mode (nx power lite) Slide 11 Column store indexes and batch processing mode (nx power lite) Slide 12 Column store indexes and batch processing mode (nx power lite) Slide 13 Column store indexes and batch processing mode (nx power lite) Slide 14 Column store indexes and batch processing mode (nx power lite) Slide 15 Column store indexes and batch processing mode (nx power lite) Slide 16 Column store indexes and batch processing mode (nx power lite) Slide 17 Column store indexes and batch processing mode (nx power lite) Slide 18 Column store indexes and batch processing mode (nx power lite) Slide 19 Column store indexes and batch processing mode (nx power lite) Slide 20 Column store indexes and batch processing mode (nx power lite) Slide 21 Column store indexes and batch processing mode (nx power lite) Slide 22 Column store indexes and batch processing mode (nx power lite) Slide 23 Column store indexes and batch processing mode (nx power lite) Slide 24 Column store indexes and batch processing mode (nx power lite) Slide 25 Column store indexes and batch processing mode (nx power lite) Slide 26 Column store indexes and batch processing mode (nx power lite) Slide 27 Column store indexes and batch processing mode (nx power lite) Slide 28 Column store indexes and batch processing mode (nx power lite) Slide 29 Column store indexes and batch processing mode (nx power lite) Slide 30 Column store indexes and batch processing mode (nx power lite) Slide 31 Column store indexes and batch processing mode (nx power lite) Slide 32 Column store indexes and batch processing mode (nx power lite) Slide 33 Column store indexes and batch processing mode (nx power lite) Slide 34 Column store indexes and batch processing mode (nx power lite) Slide 35 Column store indexes and batch processing mode (nx power lite) Slide 36 Column store indexes and batch processing mode (nx power lite) Slide 37 Column store indexes and batch processing mode (nx power lite) Slide 38 Column store indexes and batch processing mode (nx power lite) Slide 39 Column store indexes and batch processing mode (nx power lite) Slide 40 Column store indexes and batch processing mode (nx power lite) Slide 41 Column store indexes and batch processing mode (nx power lite) Slide 42
Upcoming SlideShare
Impala use case @ edge
Next
Download to read offline and view in fullscreen.

1 Like

Share

Download to read offline

Column store indexes and batch processing mode (nx power lite)

Download to read offline

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Column store indexes and batch processing mode (nx power lite)

  1. 1.  An independent SQL Consultant  A user of SQL Server from version 2000 onwards with 12+ years experience.
  2. 2. CPU Cache, Memory and IO Subsystem Latency Core L1 L2 Core L1 L2 L3 Core L1 L2 Core L1 L2 1ns 10ns 100ns 10us 100us 10ms
  3. 3. C The “Cache out” Curve Throughput Every time we drop out of a cache and use the next slower one down, we pay a big throughput penalty CPU Cache TLB NUMA Remote Storage Touched Data Size
  4. 4. CPCaches Sequential Versus Random Page CPU Cache Throughput Million Pages/sec C 1,000 900 800 700 Random Pages 600 Sequential Pages 500 Single Page 400 300 200 100 0 0 2 4 6 8 10 12 14 16 18 20 Size of Accessed memory (MB) 22 24 Service Time + Wait Time 26 28 30 32
  5. 5. Moores Law Vs. Advancements In Disk Technology  “Transistors per square inch on integrated circuits has doubled every two years since the integrated circuit was invented”  Spinning disk state of play  Interfaces have evolved  Aerial density has increased  Rotation speed has peeked at 15K RPM  Not much else . . .  Up until NAND flash, disk based IO sub systems have not kept pace with CPU advancements.  With next generation storage ( resistance ram etc) CPUs and storage may follow the same curve.
  6. 6. Control flow Row by row Row by row Row by row Row by row How do rows travel between Iterators ? Data Flow
  7. 7.    Query execution which leverages CPU caches. Break through levels of compression to bridge the performance gap between IO subsystems and modern processors. Better query execution scalability as the degree of parallelism increase.
  8. 8.  First introduced in SQL Server 2012, greatly enhanced in 2014  A batch is roughly 1000 rows in size and it is designed to fit into the L2/3 cache of the CPU, remember the slide on latency.  Moving batches around is very efficient*: One test showed that regular row-mode hash join consumed about 600 instructions per row while the batch-mode hash join needed about 85 instructions per row and in the best case (small, dense join domain) was a low as 16 instructions per row. * From: Enhancements To SQL Server Column Stores Microsoft Research
  9. 9. xperf –on base –stackwalk profile SELECT p.EnglishProductName ,SUM([OrderQuantity]) ,SUM([UnitPrice]) ,SUM([ExtendedAmount]) ,SUM([UnitPriceDiscountPct]) ,SUM([DiscountAmount]) ,SUM([ProductStandardCost]) ,SUM([TotalProductCost]) ,SUM([SalesAmount]) ,SUM([TaxAmt]) ,SUM([Freight]) FROM [dbo].[FactInternetSales] f JOIN [dbo].[DimProduct] p ON f.ProductKey = p.ProductKey GOUP BY p.EnglishProductName xperfview stackwalk.etl xperf –d stackwalk.etl
  10. 10. Conceptual View . . . Break blobs into batches and pipeline them into CPU cache Load segments into blob cache Blob cache CPU . . and whats happening in the call stack
  11. 11. SELECT p.EnglishProductName ,SUM([OrderQuantity]) ,SUM([UnitPrice]) ,SUM([ExtendedAmount]) ,SUM([UnitPriceDiscountPct]) ,SUM([DiscountAmount]) ,SUM([ProductStandardCost]) ,SUM([TotalProductCost]) ,SUM([SalesAmount]) ,SUM([TaxAmt]) ,SUM([Freight]) FROM [dbo].[FactInternetSalesFio] f JOIN [dbo].[DimProduct] p ON f.ProductKey = p.ProductKey GROUP BY p.EnglishProductName x12 Batch at DOP 2 Row mode Batch Row mode 0 100 200 300 400 500
  12. 12. Row mode Hash Match Aggregate 445,585 ms* Vs. Batch mode Hash Match Aggregate 78,400 ms* * Timings are a statistical estimate
  13. 13.    Compressing data going down the column is far superior to compressing data going across the row, also we only retrieve the column data that is of interest. Run length compression is used in order to achieve this. SQL Server 2012 introduces column store compression . . ., SQL Server 2014 adds more features to this. Dictionary Lookup ID 1 Colour Red Red Blue Blue Green Green Green Label Red 2 3 Blue Green Segment Lookup ID 1 Run Length 2 2 3 2 3
  14. 14. SQL Server 2014 Column Store Storage Internals Row Groups A B C < 1,048,576 rows Encode & Compress Store Delta stores Encode and Compress Columns Segments Blobs
  15. 15. Global dictionary Deletion Bitmap Local Dictionary Inserts of 1,048,576 rows and over Inserts less than 1,048,576 rows and updates update = insert into delta store + insert to the deletion bit map Tuple mover Delta store B-tree Column store segments
  16. 16. SELECT [ProductKey] ,[OrderDateKey] ,[DueDateKey] ,[ShipDateKey] ,[CustomerKey] ,[PromotionKey] ,[CurrencyKey] . . INTO FactInternetSalesBig FROM [dbo].[FactInternetSales] CROSS JOIN master..spt_values AS a CROSS JOIN master..spt_values AS b WHERE a.type = 'p' AND b.type = 'p' AND a.number <= 80 AND b.number <= 100 494,116,038 rows Size (Mb) 80,000 70,000 60,000 57 % 50,000 74 % 92 % 94 % 40,000 30,000 20,000 10,000 0 Heap Row compression Page compression Clustered column Column store archive compression store index
  17. 17. Query SELECT a.number INTO OrderedSequence FROM master..spt_values AS a CROSS JOIN master..spt_values AS b CROSS JOIN master..spt_values AS c WHERE c.number <= 57 ORDER BY a.number SELECT a.number INTO RandomSequence FROM master..spt_values AS a CROSS JOIN master..spt_values AS b CROSS JOIN master..spt_values AS c WHERE c.number <= 57 ORDER BY NEWID() Uncompressed Size Size After Column Store Compression 17.85 Mb 1048576 1.5 billion rows, 39,233.86 Mb 18.48 Mb
  18. 18. SQL Server 2012 SQL Server 2014 Column store indexes Yes Yes Clustered column store indexes No Yes Updateable column store indexes No Yes Column store archive compression No Yes Columns in a column store index can be dropped No Yes Support for GUID, binary, datetimeoffset precision > 2, numeric precision > 18. No Yes Enhanced compression by storing short strings natively ( instead of 32 bit IDs ) No Yes Bookmark support ( row_group_id:tuple_id) No Yes Mixed row / batch mode execution No Yes Optimized hash build and join in a single iterator No Yes Hash memory spills cause row mode execution No Yes Scan, filter, project, hash (inner) join and (local) hash aggregate Yes Feature Iterators supported
  19. 19. Disclaimer: your own mileage may vary depending on your data, hardware and queries
  20. 20. Hardware 2 x 2.0 Ghz 6 core Xeon CPUs Hyper threading enabled 22 GB memory Raid 0: 6 x 250 GB SATA III HD 10K RPM Raid 0: 3 x 80 GB Fusion IO Software Windows server 2012 SQL Server 2014 CTP 2 AdventureWorksDW DimProductTable Enlarged FactInternetSales table
  21. 21. Compression Type / Time (ms) 300000 Time (ms) SELECT SUM([OrderQuantity]) ,SUM([UnitPrice]) ,SUM([ExtendedAmount]) ,SUM([UnitPriceDiscountPct]) ,SUM([DiscountAmount]) ,SUM([ProductStandardCost]) ,SUM([TotalProductCost]) ,SUM([SalesAmount]) ,SUM([TaxAmt]) ,SUM([Freight]) FROM [dbo].[FactInternetSales] 250000 200000 150000 100000 2050Mb/s 85% CPU 50000 678Mb/s 98% CPU 256Mb/s 98% CPU 0 No compression Row compression Page No compression Row compression compression Page compression
  22. 22. No compression 545,761 ms* Vs. Page compression 1,340,097 ms* All stack trace timings are a statistical estimate
  23. 23. Elapsed Time(ms) / Column Store Compression Type 4500 Elapsed Time(ms)/Compression Type 4000 3500 3000 2500 52 Mb/s 99% CPU 2000 27 Mb/s 56% CPU flash cstore flash cstore archive 1500 1000 500 0 hdd cstore hdd cstore archive
  24. 24. Clustered column store index 60,651 ms Vs. Clustered column store index with archive compression 61,196 ms
  25. 25. We will look at the best we can do without column store indexes: Partitioned heap fact table with page compression for spinning disk Partitioned heap fact table without any compression our flash storage Non partitioned column store indexes on both types of store with and without archive compression. SELECT p.EnglishProductName ,SUM([OrderQuantity]) ,SUM([UnitPrice]) ,SUM([ExtendedAmount]) ,SUM([UnitPriceDiscountPct]) ,SUM([DiscountAmount]) ,SUM([ProductStandardCost]) ,SUM([TotalProductCost]) ,SUM([SalesAmount]) ,SUM([TaxAmt]) ,SUM([Freight]) FROM [dbo].[FactInternetSales] f JOIN [dbo].[DimProduct] p ON f.ProductKey = p.ProductKey GROUP BY p.EnglishProductName
  26. 26. Join Scalability DOP / Time (ms) Time (ms) 800000 HDD page compressed partitioned fact table 700000 Flash partitioned fact table 600000 500000 400000 300000 200000 100000 0 2 4 6 8 10 12 14 Degree of parallelism 16 18 20 22 24
  27. 27. Join Scalability DOP / Time (ms) Time (ms) 60000 hdd column store hdd column store archive 50000 flash column store flash column store archive 40000 30000 20000 10000 0 2 4 6 8 10 12 14 Degree of parallelism 16 18 20 22 24
  28. 28. A SQL Server workload should scale up to the limits of hardware, such that: All CPU capacity is exhausted or All storage IOPS bandwidth is exhausted As concurrency increases, we need to watch out for “The usual suspects” that can throttle throughput back. Latch Contention Lock Contention Spinlock Contention
  29. 29. 40000 120 Elapsed Time (ms) Pct CPU Utilisation 35000 100 30000 80 25000 20000 60 15000 40 10000 20 5000 0 0 1 2 2 4 3 6 4 8 5 10 6 12 7 14 8 16 9 18 10 20 11 22 12 24
  30. 30. 8000 100 Waiting Latch Request Count 7000 90 Pct CPU Utilisation 80 6000 70 5000 60 4000 50 40 3000 30 2000 20 1000 10 0 0 1 2 2 4 3 6 4 8 5 10 6 12 7 14 8 16 9 18 10 20 11 22 12 24
  31. 31. 10000 9000 100 Spinlock Spin Count (1000s) Pct CPU Utilisation 90 8000 80 7000 70 6000 60 5000 50 4000 40 3000 30 2000 20 1000 10 0 0 12 24 36 48 10 5 12 6 14 7 16 8 18 9 20 10 22 11 24 12
  32. 32. What most people tend to have CPU CPU used for IO consumption + CPU used for decompression < total CPU capacity Compression works for you 
  33. 33. CPU CPU used for IO consumption + CPU used for decompression > total CPU capacity Compression works against you  CPU used for IO consumption + CPU used for decompression = total CPU capacity Nothing to be gained or lost from using compression
  34. 34. No significant difference in terms of performance between column store compression and column store archive compression. Pre-sorting the data makes little difference to compression ratios. Batch mode Provides a tremendous performance boost with just two schedulers. Does not provide linear scalability with the hardware available. Does provide an order of magnitude performance increase in JOIN performance. Performs marginally better with column store indexes which do not use archive compression.
  35. 35.  Enhancements To Column Store Indexes (SQL Server 2014 ) Microsoft Research  SQL Server Clustered Columnstore Tuple Mover Remus Rasanu  SQL Server Columnstore Indexes at Teched 2013 Remus Rasanu  The Effect of CPU Caches and Memory Access Patterns Thomas Kejser
  36. 36. Thomas Kejser Former SQL CAT member and CTO of Livedrive
  37. 37. ChrisAdkin8 chris1adkin@yahoo.co.uk http://uk.linkedin.com/in/wollatondba
  • ronv123

    Apr. 13, 2015

Views

Total views

2,038

On Slideshare

0

From embeds

0

Number of embeds

5

Actions

Downloads

46

Shares

0

Comments

0

Likes

1

×