1. RockSolid SQL Server Presentation on Compression
2. Disclaimer
We recommend that you seek further professional advice before
deciding on the suitability of any recommendations in this presentation
for you. While all care is taken to ensure information is accurate, we do
not make any guarantees about the suitability of advice and advice
given may contain technical inaccuracies or topographical errors. The
answers provided are our own opinions and may differ from advice
provided by Microsoft and/or other SQL Server professionals. In no
event shall we be liable for any special, incidental, indirect, economic or
consequential damages or for loss of profit, revenue or data howsoever
caused, regardless of whether we could foresee or was advised of the
possibility or likelihood of such loss or damage.
3. Compression
What is it -
– Objects that can be configured
– Flexibility compression offers a DBA
– How data compression works
– Different types of compression
How do you determine -
– If it’s worth using compression
– Can I use compression
– When to compress
– How to use compression
Why would I use it -
– Case studies
– Advantages & disadvantages
– Best practices
This presentation is for DBA’s considering using compression
4. What is compression?
Compression is a tool that the DBA can use to –
• Make efficient use of disk space which can translate into a cost
saving.
• Potentially, increase I/O performance as more data is stored in
memory.
• Data compression can be configured for a number of specific
database objects.
– A table stored as a heap or clustered index, A non-clustered
index, An indexed view or Partitioned tables and indexes
• You don’t need to compress all objects in a database. You can
choose what objects and what type of compression to use.
• It’s flexible as you don’t need to compress all objects in a database
– For example:
• Row compress some tables, page compress others and leave
others uncompressed
• Row compress some table PARTITIONS, page compress
others and leave others uncompressed
5. What is compression?
How does compression work?
• Compression is handled entirely by the storage engine & most other
components within SQL Server are not aware compression exists
• Data compressed on disk & in memory meaning more data on fewer
pages
Relational Engine
Storage engine compresses and decompresses data
Storage Engine Buffer cache Data is compressed
Data is compressed
Data is compressed
Files on
disk
6. What is compression?
Different types of compression?
• Row compression: Saves space by changing fixed data types
into a variable length format
Char(20) Before Char(20) After
Sean Free Space
space Sean
• Page compression: Uses row compression then optimizes
storage by removing repeating patterns of data & replacing it
with abbreviated ref. Stores data once in CI structure.
– Prefix
– Dictionary
– Page compression progresses in the following order:
Row Prefix Dictionary
7. What is compression?
What is Prefix & Dictionary?
– Stores repeating data once
• Prefix:
– Prefix: looks for repeating PREFIXED byte patterns on a
given column across all rows on each data page
• Dictionary:
– Searches for repeating byte patterns ANYWHERE across all
columns and all rows on each data page
– Stores Compression Information (CI) on the data same page
immediately following page header.
Why have Row & Page compression options?
– On average rebuilding an index using –
• Row compression takes approx 1.5x normal CPU usage
• Page compression takes approx 2x - 5x normal CPU usage
8. What is compression?
How does the page store Prefix & Dictionary compression?
Uncompressed Page (Prefix) compression
Page Header Page Header
aaabcc aaaacc abcd
aaabb aaaab abcd 4b 4b
aaabcc bbbb abcd 0bbbb
9. What is compression?
Backup compression
– Offers no options other than to turn it on or off
– If you are using data compression or encryption you may not
notice any benefit
– Using backup compression has the following advantages:
• A typical database using data compression will compress
some 60%-80% smaller
• We’ve found that using backup compression reduces disk
space issues during maintenance windows as the backup is
smaller.
• Quicker to back up a database using compression.
• Quicker to restore a compressed backup resulting in a faster
recovery.
• It can save you money when investing in DR solutions.
10. What is compression - Summary?
• We’ve talked about -
– What compression is and how SQL Server manages it
– The various objects that can be compressed & the flexibility this
offers
– The various compression options
• Data
– Row
– Page
• Backup
11. How do I use compression?
Is it worth using compression in your environment?
• Estimate disk space savings
– Use the sp_estimate_data_compression_savings. This stored
proc takes a sample of the data and compresses it in tempdb.
• Understand the workload & determine % of updates & scans for
different objects
• Understand the structure - Data types that compress are mainly
numeric or fixed length character types.
– Smallint, Int, Bigint, Decimal, Numeric, Bit, Smallmoney, Money,
Float, Real, Datetime, Char, nchar
– NULL data values or repeating data will compress well
– The type of database operation – datawarehouse
12. How do I use compression?
Is it worth using compression in my env continued ..?
• Data types that don’t compress
– Smalldatetime, date, time, varchar(max), nvarchar(max), text,
ntext, image, cursor, xml, user-defined data types (varbinary)
– Small number of repeating data, encrypted data
– FILESTREAM unstructured data (video, images, documents etc.)
Can I compress in my environment?
13. How do I use compression?
When to use compression?
• If you decide to compress a table or index make sure it is done in a
low activity or maintenance window
• Online –v- Offline
• Compression is transparent. It’s not obvious what's compressed so
make sure you maintain compression as the data/workload changes.
You can compress a table or index when it is initially created or after it
has already been created using the –
– CREATE INDEX or CREATE TABLE
– ALTER INDEX or ALTER TABLE
– BACKUP DATABASE .. COMPRESSION
14. How do I use compression?
How do I reclaim space?
After you’ve compressed an object you won’t see any reclaimed disk
space until you -
• Option 1 – You shrink the file and reclaim the space. However, this
can cause fragmentation.
• Option 2 – Create a new, compressed, empty table and copy the
data to the new table
• Option 3 – You don’t reclaim the disk space and decide to keep the
released free space in the file group for future data growth.
16. How do I use compression - Summary
We’ve talked about –
– How to estimate compression savings
– Why you should understand the workload and type of data your
database is storing
– Which data types compress well
– Which data type don’t compress very well
– How to implement compression for new and existing objects
– Which versions of SQL Server have compression options
– Online-v-offline
– Reclaiming space post compression
– A couple of practical examples of implementing SQL Server
compression
17. Why use compression?
Case study implementing compression for JDE
Disk space issues resulting ENABLE BACKUP
in SR’s - Want to save on COMPRESSION by default DATA COMPRESSION -
disk space & perhaps and compare before and Understand the workload!
increase performance after stats
Take a performance
ESTIMATE compression
Compress in test env first. baseline without
savings (row, page)
compression.
Testing successful
compress smallest If significant CPU pressure
End result 570GB db down
objects. Re-iterate being experienced consider
to 160GB - I/O response
process. Take using Row compression
times improved
performance stats & only
compare
18. Why use compression?
• Advantages
– Can significantly reduce disk space
– Can significantly reduce I/O
– Can significantly improve performance as more data is stored in
memory
– Works extremely well for databases where frequent updates of
data are not experienced.
• Disadvantages
– Only certain data types will compress
– If you have CPU issues compressing database objects may
intensify those issues
19. Why use compression?
Best Practices
– Determine if you’ll benefit from compression
• Type of data (image, character etc.)
• What is the database used for (Datawarehouse, OLTP)?
• Consider the database workload. What tables are used for
scans and updates?
• Use sp_estimate_data_compression_savings to estimate
compression savings
– Determine if you are suffering from high CPU usage before
implementing compression.
– Compress during maintenance windows as indexes are rebuilt
when initially compressed
– Test compression in test environment first to see what benefit
you achieve
– Regularly check your environment for uncompressed objects
20. Why use compression?
Best Practices continued ..
– Plan your compression
• Compress one table, index or partition at a time rather than
concurrently
• If you’ve decided to compress a number of objects start with
the smallest in the list.
• Decide if you want to compress while Online or Offline. Offline
is faster and compresses quicker but the table is locked for
the duration of the compression operation.
21. Summary
• Implement backup compression unless you have a good reason not
to do so as the impact is usually low and space savings high.
• JDE example when compression was a success for us
• Compression can offer you substantial cost saving in terms of disk
space usage and implementing DR solutions.
• Compression can increase database performance resulting in fewer
user issues and fewer issues for the DBA
• Beyond the initial analysis using data compression does not take a
great deal of time to maintain
22. References
• MS Books on-line
• https://www.dbaouttask.com/blog
• http://www.sqlservercentral.com
• http://blogs.msdn.com/b/sqlserverstorageengine/archive/tags/data
+compression/
• http://bradmcgehee.com/
Contact
• Sean Young
• youngs@redrock.net.au
For example, at RockSolid SQLwe perform a lot of maintenance and performance health checks for clients and quite often we identify I/O as a performance bottle neck. We also identify disk space as a key problem. Compression can in some part, help alleviate those issues.
2008 R2 Enterprise - Rock solid takes samples of data over a given period of time & and it can accurately predict disk space. We noticed that this particular environment was about to suffer from space issues. We discussed with the client a variety of issues including adding additional space to using compression.As the client was reluctant to spend more money on storage we discussed and decided to implement compression.Step 1 – was to enable backup compression by default. We were able to determine that we achieved a compression ratio of 4:1Step 2 – look at implementing data compression by first understanding the workload. We did this by targeting tables that were used predominantly for scans. We also did this by talking to the client and the JDE support consultants.Step 3 – We took a performance baseline prior to implementing compression.Step 4 – We estimated compression on a number of indexes and tables.Step 5 – We tested compressing tables and indexes in a test environment. Testing proved successful. We regained a great deal of disk space and performance actually improved.Step 6 – We rollout out compression changes to the production environment. We started with the smallest objects and used ROW compression first. We then took another performance baseline and compared the BEFORE and AFTER stats. We then used PAGE compression on those objects and again compared performance. We re-iterated this process. As we got more and more confidence we PAGE compressed larger and larger tables and indexes. The end result was we reduced storage in this JDE environment from 570GB to 160GB. The total time it took to compress was 2 hours for tables, 6.5 hours for indexes and 3.5 hours for the shrink file.