Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databases - charley hanania

  • 207 views
Uploaded on

Compression: a hidden Gem for IO heavy Databases …

Compression: a hidden Gem for IO heavy Databases

The limiting factor in most database systems is the ability to read and write data to the IO subsystem.

We're still using storage layouts and methodologies in SQL Server that are a reflection of old spinning media in times gone by.

Until major changes are made to the internal storage layouts, we have "some" hope with options such as data compression, sparse columns and filtered indexes, which not only save space on disk, but also reflect a saving in memory.

In this session we will go over the IO savings technologies presented in SQL Server, and discuss how implementing some of these will assist in your operational performance goals.

Presenter: Charley Hanania, MVP
Charley is Principal Consultant at QS2 AG in Switzerland and has consulted to organisations of all sizes during his extensive career in Database and Platform Consulting.
He's been focussed on SQL Server since v4.2 on OS/2 and with over 15 years of experience in IT he's supported companies in the areas of DB training, development, architecture & administration throughout Europe, America & Australasia.
Communities are Charley's passion and he became active in database communities in the mid 90's, participating in heterogeneous database user groups in Australia. He continues to lead an active role through community events such as Database Days, the European PASS Conference, PASS & the Swiss PASS Chapter.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
207
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • The first column actually takes up zero bytes, as it matches the stored column prefix exactly – a saving of four bytes!All the others store a single byte that identifies how many bytes to use from the stored column prefix, as well as the differential bytes coming afterwards.
  • sp_estimate_data_compression_savings [ @schema_name = ] 'schema_name' , [ @object_name = ] 'object_name' , [@index_id = ] index_id , [@partition_number = ] partition_number , [@data_compression = ] 'data_compression' [;]

Transcript

  • 1. Swiss PASS Chapter December Chapter Meeting 2013 Hotel Continental, Zurich 10th December 2013
  • 2. Swiss PASS Chapter December Chapter Meeting 2013 Novotel Hotel, Geneva 11th December 2013
  • 3. December 2013 Meeting Agenda Meeting Agenda: 15:30 - 15:45 : 15:45 - 16:00 : 16:00 - 17:15 : 17:15 - 17:45 : 17:45 - 19:00 : 19:00 - Arrival and Registration Introduction & Welcome HDInsight: Introduction to Hadoop on Windows Azure Break & Refreshments Compression: a hidden Gem for IO heavy Databases End / Drinks
  • 4. Session 1: HDInsight: Introduction to Hadoop on Windows Azure Speaker: Scott Klein – Microsoft Corp Apache Hadoop is open source software that was born out of the large web properties need to process massive amounts of data. Now Hadoop is becoming the de facto platform for big data storage and processing in the enterprise. A modern Hadoop-based data platform is a combination of multiple open source projects brought together tested and integrated to create an enterprise-grade platform. In this session we review the Hadoop projects required in a Windows Hadoop platform and how it integrates with Microsoft SQL Server Database Platform analysis tools. We drill down into Hadoop integration and implementation Windows, Windows Azure and integration with the SQL Server Data Platform. We will also take a detailed look at Hadoop in Windows Azure, called HDInsight, and how it provides the enterprise processing required with todays data and information.
  • 5. Session 2: Compression: a hidden Gem for IO heavy Databases Speaker: Charley Hanania, MVP The limiting factor in most database systems is the ability to read and write data to the IO subsystem. We're still using storage layouts and methodologies in SQL Server that are a reflection of old spinning media in times gone by. Until major changes are made to the internal storage layouts, we have "some" hope with options such as data compression, sparse columns and filtered indexes, which not only save space on disk, but also reflect a saving in memory. In this session we will go over the IO savings technologies presented in SQL Server, and discuss how implementing some of these will assist in your operational performance goals.
  • 6. Virtual Chapter Meetings – December VIRTUAL CHAPTER MEETING TOPIC Performance Dec 5 – 14:00 (GMT-05:00) Natural Born Killers, Performance Issues to Avoid – presented by Richard Douglas Application Development Dec 6 – 12:00pm (GMT-07:00) Isn’t That Spatial: Spatial Data for the App Dev – presented by Jason Horner Virtualization Dec 11 – 10:00am (GMT – 07:00) Virtualization Q&A with Brent Ozar Business Intelligence Dec 17 – 12:00 (GMT+12:00) Inferred Dimension Members within MDS and SISS – presented by Reza Rad Big Data Dec 17 – 14:00 (GMT -05:00) Building a Big Data Architecture – presented by Joey D’Antoni Global Russian Dec 18 SQL Server Data Quality Services – presented by Константин Хомяков
  • 7. PASS Virtual Chapters Listing Check out the sqlpass.org for more information on all the Virtual Chapters including: • • • • • • • • • • • 7 Application Development Azure Big Data Book Readers Business Analytics Business Intelligence Data Architecture DBA DBA Fundamentals Healthcare Master DataData Quality • • • • • • • • • • • Performance Oracle Powershell Professional Development Security Virtualization Women in Technology Virtual PASS Portuguese Global Chinese Global Spanish Global Russian
  • 8. JOIN US for our second annual event to get the best learning for analyzing, managing, and sharing business information and insights through the Microsoft Data Platform of technologies.
  • 9. Stay Involved! • Sign up for a free membership today at sqlpass.org • • • • Linked In: Professional Association for SQL Server Facebook: Professional Association for SQL Server Group Twitter: @SQLPASS The PASS Blog: sqlpass.org
  • 10. General Introductions… My Technical Focus is… My name is… I work at… I’ve worked on … for …. years. My hobbies? well… 
  • 11. Compression a hidden Gem for IO heavy Databases Charley Hanania :: QS2 AG – Quality Software Solutions B.Sc (Computing), MCP, MCDBA, MCITP, MCTS, MCT, Microsoft MVP: SQL Server Principal Consultant - Senior Database Specialist
  • 12. Presentation Summary The limiting factor in most database systems is the ability to read and write data to the IO subsystem. We're still using storage layouts and methodologies in SQL Server that are a reflection of old spinning media in times gone by. Until major changes are made to the internal storage layouts, we have "some" hope with options such as data compression, sparse columns and filtered indexes, which not only save space on disk, but also reflect a saving in memory. In this session we will go over the IO savings technologies presented in SQL Server, and discuss how implementing some of these will assist in your operational performance goals.
  • 13. The idea behind compression • Save on storage costs • • SAN based storage Backup storage • Improve Read Performance • • IO is generally the worst performer in Database infrastructure Lower IO & Higher CPU = reduced transactional turnaround* * In most cases
  • 14. Vardecimal Implementation • Introduced in SQL Server 2005 Service Pack 2 (SP2) • Marked as deprecated shortly thereafter in preference to Row and Page Compression • Applied to a whole table • Stored using scientific notation • (sign) * mantissa * 10exponent
  • 15. Row Compression Implementation • Only changes the physical storage format of the data that is associated with a data type. • Uses variable-length storage format for numeric types. • Stores fixed character strings using variable-length format by not storing the blank characters. • NULL and 0 values across all data types are optimized and take no bytes.
  • 16. Page Compression Implementation • Does the following: 1. 2. 3. 4. Row compression Prefix compression* Dictionary compression Unicode Compression Prefix Compression: • Value is identified that can be used to reduce the storage space for the values in each column. • Row is created to identify the prefix. • Repeated prefix values in the column are replaced by a reference *at the byte level, page compression is data type agnostic Images from SQL BOL
  • 17. Page Compression Implementation • Does the following: 1. 2. 3. 4. Row compression Prefix compression Dictionary compression Unicode Compression Dictionary Compression: • Searches for repeated values anywhere on the page, and stores them in the CI area • Is not restricted to one column as per prefix CI = compression information structure
  • 18. Page Compression Implementation • Does the following: 1. 2. 3. 4. Row compression Prefix compression Dictionary compression Unicode Compression Unicode Compression: • Works on Row and Page compressed data • Reduces length of datatype storage due to underutilised storage (double byte) capability. • NCHAR & NVARCHAR Data can see savings of up to 50%* • Does not work for off row or in nvarchar(max) * 50% for data that is wholly single byte
  • 19. Identifying what would be good for compression. Experiment
  • 20. Sparse Columns • A type of column storage where a NULL takes no storage at all • normal NULL columns require space to indicate that they are NULL • The data is stored internally in a different form • The Keyword SPARSE is used in the column creation definition • Sparse columns can also be used as a column set, which then allows you to return data that is only in sparse columns • Best used when you have many columns that are mostly going to be empty/NULL, otherwise slightly innefficient.
  • 21. A look into how to implement & digest sparse columns Experiment
  • 22. Filtered Indexes • An index with a WHERE clause! • Can be used with Index Intersection etc in the Query Optimiser • Allows statistics to be more focused as the dataset within the index is distilled • Compression Capable • Overhead is in the Administration
  • 23. Simple example of efficiency introduced through a Filtered Index Experiment
  • 24. Links and Resources • Storing Decimal Data As Variable Length • Mark Rasmussen • The Anatomy of Vardecimals • SQL Server Diagnostic Information Queries • Glenn Berry • Pro SQL Server 2012 Relational Database Design and Implementation • Microsoft SQL Server 2008 Internals • SQL Server 2014 : Native backup encryption & compression • Aaron Bertrand • Tonight’s T-SQL Code (http://bit.ly/1eiTOwT)
  • 25. Thank you!