Combining IBM Real-time Compression and IBM ProtecTIER Deduplication


Published on

Learn about Combining IBMReal-timeCompression andIBM ProtecTIER Deduplication where combining storage optimization technologies helps achieve compelling results. For more information on IBM Systems, visit

Visit the official Scribd Channel of IBM India Smarter Computing at to get access to more documents.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Combining IBM Real-time Compression and IBM ProtecTIER Deduplication

  1. 1. IBM Systems and Technology Thought Leadership White Paper July 2011 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication Benchmark tests show that combining storage optimization technologies achieves compelling results
  2. 2. 2 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication Contents 2 Introduction 3 Landmarks in the data optimization landscape 5 The need for data optimization in database backup and recovery 7 The test environment: An overview 10 Test 1: ProtecTIER deduplication only 10 Test 2: IBM Real-time Compression and ProtecTIER deduplication 11 Summary 11 For more information Introduction As the capacity and overhead of powering, cooling and manag- ing larger amounts of storage continues to outpace the growth of storage budgets, IT decision makers are increasingly looking to optimization technologies to meet capacity demands while minimizing capital expenditures. Recently, two storage optimiza- tion approaches in particular have been receiving significant attention in the industry: real-time compression for primary and secondary data, and data deduplication for highly redundant backup data sets. Although sometimes viewed as mutually exclusive, the two technologies are, in fact, very complementary. This paper discusses the compelling financial and operational advantages of deploying real-time compression and data dedupli- cation in conjunction, as demonstrated by the results of tests in which IBM Real-time Compression and IBM® ProtecTIER® Deduplication solutions were combined to optimize Oracle database physical backups in a Network File Storage (NFS) environment. The compelling results of combining IBM solutions for real- time compression and data deduplication in the Oracle database environment include: ● Greater than 82 percent immediate savings on initial write to disk. ● Greater than 96 percent overall data reduction when com- bined with deduplication. ● Up to 71 percent reduction in backup time. ● Less CPU utilization on the deduplication engine. ● Less disk activity in the deduplication subsystem. ● Less network traffic on the deduplication backup network.
  3. 3. 3IBM Systems and Technology Figure 1: The data optimization landscape Detailed test results are included later in this document. The bottom line is that combining real-time compression and data deduplication optimizes your overall storage footprint by reduc- ing data on your primary NAS devices as well as throughout your data life cycle. By combining these two technologies, you can achieve maximum data reduction, which maximizes your return on investment and dramatically improves your data pro- tection performance and capabilities. Landmarks in the data optimization landscape To fully appreciate the benefits of combining real-time compres- sion and data deduplication, it’s important first to understand how each technology works, the differences between them, and where they fit in the overall storage architecture, as shown in Figure 1.
  4. 4. 4 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication Real-time compression Data compression reduces the size of data files so that less space is required to store them. Real-time compression, as the name implies, is the ability to compress data in real time—before it is written to the hard disk rather than after—without any notice- able performance degradation. Designed to sit transparently in front of primary Network Attached Storage (NAS), IBM Real-time Compression offers the unique advantage of making it possible to shrink primary, online data in real time with no loss in speed. With over 30 patents, it reduces the size of every file you create by up to five times, depending upon the file type. It significantly reduces the physical capacity required to store a file (or copies and per- mutations of a file) through the entire data life cycle, including backup. IBM Real-time Compression also has a feature called the Compression Accelerator that enables the non-disruptive compression of data that has already been saved to disk—while applications continue to have random, read-write access to the data. IBM Real-time Compression can also significantly enhance overall network and storage performance, since less data is writ- ten to disk and more data can be stored in the storage cache. As more server workloads become virtualized, real-time com- pression becomes increasingly valuable as a tool for storage optimization in virtualized environments. The technology works particularly well given the compression rates associated with virtualized files (see Table 1). As a result, many companies that have adopted file virtualization technologies are also exploring deployment of IBM Real-time Compression in conjunction with file virtualization. IBM Real-time Compression solutions transparently integrate with file virtualization solutions and can dramatically extend the cost reductions that file virtualiza- tion enables. File type Compression rate Database Up to 85 percent Microsoft Office Up to 20 - 60 percent VMware VMDK (virtualized files) Up to 72 percent CAD/CAM Up to 70 percent Oil and Gas Up to 50 percent Table 1: Compression rates
  5. 5. 5IBM Systems and Technology Data deduplication Data deduplication is designed to reduce the physical storage required to store redundant data. The deduplication process removes duplicate data and replaces it with a pointer to the main copy, leaving only one copy of the data that actually has to be stored. This is why it is well suited for backup data where there are typically multiple data sets (daily/weekly, for example) of mostly redundant data. The more copies of redundant data you have, the higher your effective deduplication rate. IBM ProtecTIER Deduplication solutions feature revolutionary and patented HyperFactor® data deduplication technology. They provide enterprise-class performance, scalability and proven enterprise-level data integrity to meet disk-based data protection needs while enabling significant infrastructure cost reductions. They specifically provide improved backup performance, up to 2000 MB/sec (7.2 TB/hour) sustained inline deduplication, and even faster restores at up to 2800 MB/sec (10 TB/hour). It also provides: ● The ability to scale to 1 PB of physical storage. ● A reduction in storage capacity consumption of up to 25 times or more. ● A non-hash-based approach that protects data integrity by reducing the risk of data loss due to hash collision. Technology differences As described above and illustrated in Figure 1, real-time com- pression and data deduplication technologies address different problems and sit at different points in the data life cycle. But more importantly, the two technologies are complementary; in particular, deploying real-time compression significantly enhances the value and performance of data deduplication. This conclusion has been demonstrated in a series of performance tests, which are described in detail in this paper. The need for data optimization in database backup and recovery In general, backup and recovery refers to the various strategies and procedures involved in protecting a database against data loss and allowing for the reconstruction of the database after any kind of disaster. The performance and reliability of backup and recovery operations are critical to effective database operation. Physical backups are backups of the physical files used in storing and recovering your database, such as data files, control files and archived redo logs. Ultimately, every physical backup is a copy of files storing data- base information to some other location, whether on disk or some offline storage media such as tape. Backup performed after a database is properly shut down is called cold database backup. Conversely, backup performed when a database is online and fully functional is called hot database backup. During a cold backup, the database is shut down and unavailable; obviously, any technology that reduces the period of time that the database is offline is advantageous. In either case, due to the tremendous growth in data, it is becoming increasingly difficult for backups to complete within designated backup windows.
  6. 6. 6 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication IBM TS7610 ProtecTIER Deduplication Appliance Express Tivoli Storage Manager NDMP Backup Gigabit Ethernet Switch Oracle Database 3x DB Clients Quest Benchmark Factory IBM Real-time Compression IBM N5600-A10 Brocade 4100 LAN Figure 2: The test environment
  7. 7. 7IBM Systems and Technology Although the benefits of utilizing database backups (either cold or hot) are clear, their use can result in the creation of large amounts of data that must reside in the storage environment, taking up precious disk space and increasing the complexity and cost of backup procedures. This is why primary storage opti- mization is so important. The test environment: An overview In order to simulate accurate and realistic data storage scenarios, IBM used Quest Software’s Benchmark Factory and Data Factory to create and populate an Oracle database running over NFS to an IBM System Storage® N5600-A10 storage con- troller. A 37GB baseline database was then used to test the effects of data deduplication and compression, respectively. Each test had a seven percent daily change rate that was simulated between each database copy. Seven database copies were then taken to simulate a week’s worth of Oracle data sets in an enter- prise environment through a combination of updates to existing data, additions of new data, and other database activities such as delete, drop, create and remove. Backup using Network Data Management Protocol (NDMP) The test environment consisted of an IBM TS7610 ProtecTIER Deduplication Appliance Express acting as a Virtual Tape Library attached to an IBM Tivoli® Storage Manager server. In such a configuration, the Tivoli Storage Manager server controls the virtual tape library through a direct physical connection to the library robotics control port. (The library robotics, the IBM Tivoli Storage Manager server and the NAS file server are all connected over Fibre Channel.) For NDMP operations, the drives in the library were connected directly to the NAS file server, with a path defined from the NAS head to the virtual drives. The NAS file server transfers data to the virtual tape drives at the request of the IBM Tivoli Storage Manager server. As shown in Figure 3, to allow Tivoli Storage Manager to use the virtual tape drives for non-NDMP operations, the virtual tape drives were also connected to the Tivoli Storage Manager server, with paths defined from the server to the drives. This configuration also supports an IBM Tivoli Storage Manager storage agent having access to the virtual tape drives for its LAN-free operations.
  8. 8. 8 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication Tivoli Storage Manager Server NAS File Server NAS File Server File System Disks Web Client (optional) Virtual Tape Library LEGEND SCSI or Fibre Channel Connection TCP/IP Connection Data Flow Robotics Control Drive access Figure 3: NDMP architecture
  9. 9. 9IBM Systems and Technology SAN configuration All components with Fibre Channel connectivity (Tivoli Storage Manager server, TS7610 ProtecTIER Deduplication Appliance Express and IBM System Storage N5600 storage controller) were connected to a Brocade 4100 SAN switch in the test envi- ronment. According to Fibre Channel SAN zoning best prac- tices, five zones were defined, each including one initiator and one target. Four zones were defined to connect ProtecTIER’s virtual tape library robotics and its virtual tape drives to the Tivoli Storage Manager server in a redundant manner, in order to enable Control Path Failover (CPF) and Data Path Failover (DPF) between the Tivoli Storage Manager server and the ProtecTIER virtual tape library. Control Path Failover and Data Path Failover were handled transparently by the IBM Tape Device Driver for Windows (IBM tape) installed on the Tivoli Storage Manager server. In addition, a single virtual drive was zoned to the N series to enable the NAS file server to transfer data directly to a virtual drive. N Series configuration The IBM N5600-A10 storage controller configuration included 28 144 GB 15k Fibre Channel drives in a RAID-DP environment. The N5600 operating environment was ONTAP 7.3.3 for the tests. IBM ProtecTIER Deduplication configuration A ProtecTIER virtual tape library was defined with two virtual tape drives, each assigned to one Fibre Channel front-end port. To enable Control Path Failover, the virtual robot was assigned to both Fibre Channel front-end ports. ProtecTIER’s LUN masking feature was used to assign specific virtual devices to a specific host running backup application modules. This feature enables multiple initiators to share the same target Fibre Channel port on the ProtecTIER system without having conflicts on the devices that are being emulated. Ten cartridges were defined to store backup data. To limit the nominal cartridge size, maximum cartridge growth was set to 200 GB. Under these conditions, as soon as a cartridge stores 200 GB of nominal data, it is marked “full” and another car- tridge is used to backup data. Ten virtual tape slots were defined in the virtual tape library to house the ten virtual tape cartridges. By default, eight import/export slots were defined. IBM Real-time Compression configuration The IBM Real-time Compression Appliance STN6800 (Version 3.7.0) with Gigabit Ethernet ports were used for the tests. 1-Gigabit Ethernet connections were established to the Gigabit Ethernet switch and the N5600 for connectivity between the Oracle server and the N5600 storage controller.
  10. 10. 10 Combining IBM Real-time Compression and IBM ProtecTIER Deduplication Tivoli Storage Manager configuration A server running IBM Tivoli Storage Manager Extended Edition was installed using Tivoli Storage Manager’s built-in configuration wizards. (An Extended Edition license is required to allow NDMP backups of NAS devices.) The Tivoli Storage Manager database size was configured to 2048 MB and the log size was configured to 1024 MB. To initiate an NDMP backup from the Tivoli Storage Manager server, the backup node command was used to perform a full backup of the Oracle database files on the N5600 storage con- troller. A table of contents (TOC) was not created as it is needed only for single file restore. The backup NAS process in Tivoli Storage Manager was monitored to measure backup time. Test 1: ProtecTIER deduplication only To illustrate the benefits of data deduplication, tests were per- formed to deduplicate the seven 37 GB cold backup sets using the ProtecTIER deduplication appliance only. Deduplication was performed during the time the data was copied using NDMP from the N5600 storage controller to the ProtecTIER deduplication appliance. Test 2: IBM Real-time Compression and ProtecTIER deduplication To illustrate the added benefits of using real-time compression with deduplication, an IBM Real-time Compression STN6800 appliance was installed in front of the IBM N5600 storage controller. IBM Real-time Compression provided an immediate footprint reduction of the database file size from 37 GB to 6.6 GB, a reduction of over 82 percent. The introduction of IBM Real-time Compression provided immediate space savings, since the compression was performed in real time, when the data was written to storage. No post processing or configuration changes were required to realize these savings. Clearly, both data deduplication and compression standing alone offer significant space savings over traditional, non-optimized storage. However, the benefits of combining these technologies are even more compelling. BackupTimeinSeconds Day of Backup 900 800 700 600 500 400 300 200 100 0 1 2 3 4 5 6 7 IBM Real-time Compression with IBM ProtecTIER IBM ProtecTIER Figure 4: Backup times for IBM Real-time Compression combined with ProtecTIER Deduplication, compared to deduplication alone
  11. 11. 11IBM Systems and Technology UsedSpaceGB Day of Backup 25 20 15 10 5 0 1 2 3 4 5 6 7 IBM Real-time Compression with IBM ProtecTIER IBM ProtecTIER Figure 5: Space used with IBM Real-time Compression combined with ProtecTIER Deduplication, compared to deduplication alone When the ProtecTIER deduplication solution was used to backup the IBM Real-time Compression compressed data, the seven compressed backup sets were further reduced by an aver- age of 39 percent. In addition, backup of compressed data took an average 68 percent less time than backup in the absence of IBM Real-time Compression. Summary While IBM Real-time Compression and IBM ProtecTIER Deduplication solutions both offer compelling storage and data protection benefits when used individually, the combination of the two technologies has been shown to produce far greater storage efficiency, significantly reduce backup times, and improve utilization of resources. Tests involving Oracle database physical backups have shown that together, these data compression and deduplication solutions are capable of produc- ing benefits far exceeding those found with either technology alone—including 96 percent overall data reduction and up to 71 percent reduction in backup time, as well as better deduplica- tion CPU utilization, less deduplication disk activity and less deduplication network traffic. These results demonstrate strong synergies between real-time compression and deduplication and present a powerful argument for using both in order to achieve storage optimization. For more information To learn more about how IBM Real-time Compression and IBM ProtecTIER Deduplication solutions can optimize storage efficiency in your environment, contact your IBM representative or visit and Additionally, financing solutions from IBM Global Financing can enable effective cash management, protection from technol- ogy obsolescence, improved total cost of ownership and return on investment. Also, our Global Asset Recovery Services help address environmental concerns with new, more energy-efficient solutions. For more information on IBM Global Financing, visit:
  12. 12. Please Recycle © Copyright IBM Corporation 2011 IBM Systems and Technology Group Route 100 Somers, NY 10589 U.S.A. Produced in the United States of America July 2011 All Rights Reserved IBM, the IBM logo, and ProtecTIER are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarks are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at Other company, product and service names may be trademarks or service marks of others. This paper is intended to provide information regarding IBM Real-time Compression Appliance (RTC) in combination with ProtecTIER Deduplication solutions. It discusses findings based on configurations that were created and tested under laboratory conditions. These findings may not be realized in all customer environments, and implementation in such environments may require additional steps, configurations and performance, compression and deduplication analysis. This information does not constitute a specification or form part of the warranty for any IBM or non- IBM products. Information in this document was developed in conjunction with the use of the equipment specified and is limited in application to those specific hardware and software products and levels. The information contained in this document has not been submitted to any formal IBM test and is distributed as-is. The use of this information or the implementation of these techniques is a customer responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk. IBM may not officially support techniques mentioned in this document. For questions regarding officially supported techniques, please refer to the product documentation or announcement letters, or contact IBM Support. This document could include technical inaccuracies or typographical errors. IBM may not offer the products, services or features discussed in this document in other countries, and the product information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. Any statements regarding IBM’s future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. The information contained in this document is current as of the initial date of publication only and is subject to change without notice. All performance information was determined in a controlled environment. Actual results may vary. Performance information is provided “AS IS” and no warranties or guarantees are expressed or implied by IBM. Information concerning non-IBM products was obtained from the suppliers of their products their published announcements or other publicly available sources. Questions on the capabilities of the non-IBM products should be addressed with the suppliers. IBM does not warrant that the information offered herein will meet your requirements or those of your distributors or customers. IBM disclaims all warranties, express or implied, including the implied warranties of noninfringement, merchantability and fitness for a particular purpose or noninfringement. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. TSW03093-USEN-00