Your SlideShare is downloading. ×
Storage Switzerland:  Smarter Primary Storage Through Real-time Compression
Storage Switzerland:  Smarter Primary Storage Through Real-time Compression
Storage Switzerland:  Smarter Primary Storage Through Real-time Compression
Storage Switzerland:  Smarter Primary Storage Through Real-time Compression
Storage Switzerland:  Smarter Primary Storage Through Real-time Compression
Storage Switzerland:  Smarter Primary Storage Through Real-time Compression
Storage Switzerland:  Smarter Primary Storage Through Real-time Compression
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Storage Switzerland: Smarter Primary Storage Through Real-time Compression


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Smarter Primary Storage Through Real-timeCompression Prepared by: George Crump, Senior Analyst Prepared on: 7/20/2011 Copyright © 2010 Storage Switzerland, Inc. - All rights reserved ! !"#$%!&()*+,!&-$.,)/*01!2#$,!3*4,)!-*%!5(66$%%$(0,1!78!9:;!! !*01!$%!1$%)$7<,1!<01,)!/$5,0%,!=)(6!&()*+,!&-$.,)/*01>!??@
  • 2. Smarter Primary Storage Through Real-time CompressionSmarter Primary Storage Through Real-time CompressionIt is no great revelation to learn that primary storage continues to grow at an alarming rate. What may notbe immediately obvious is the ripple effect of that growth and that the data supporting every processthroughout the storage lifecycle is also increasing in size. Where this is most typically seen is in increasedbackup storage consumption. Whatʼs often missed is the impact of primary storage growth on datamanagement practices like data protection.The traditional ʻsolutionʼ to resolving this capacity problem has been to buy and implement even morecapacity. Storage is relatively inexpensive and the cost to add more can seem like a quicker and easierway out of a capacity problem than the alternatives. Also, storage systems have advanced to the pointthat, physically connecting more storage is less disruptive than it once was. Even though storage capacitycontinues to be inexpensive there is a ripple effect to its addition that impacts overall efficiency and thisripple effect is now haunting data centers. No longer can IT organizations continue to keep addressing theproblem by adding more and more capacity.Even if the additional storage can be cost-justified there is also the threat of running out of data centerfloor space or cooling capability, and the very real cost of managing that additional capacity. Interestingly,when only these ʻsupportʼ costs are considered, the rate at which storage is now growing is still outpacingits reduction in cost per GB.In short, IT has to do more with the capacity it already has and maybe even shrink that down to a moremanagement size. . The costs of ʻcare and feedingʼ each additional GB of storage are simply too great.IBM Real-time Compression, which addresses storage expansion at its root cause, primary storage, maybe the ideal solution. It not only reduces immediate storage growth, but has the long term impact ofincreasing the efficiency of storage administrators and reducing costs.Primary Storage Growths Ripple EffectWhen primary storage grows itʼs not just the capacity thatʼs added to the storage system upfront, itʼs theimpact of those additions. Physically adding more shelves to a storage system increases its footprint.Data center floor space is becoming one of the most expensive resources in IT, along with power andcooling. Each additional shelf requires more power and cooling and reduces air flow, driving the need foreven more cooling which in turn drives up power usage further.The next area of concern is the impact on time efficiency of the IT staff that manages storage and thetime lost by users while they wait for that storage to come online. Even if storage can be successfullyadded to a live storage system, human decisions need to be made, and thatʼs where the dynamic natureof storage expansion comes to a grinding halt. First, it must be decided how the new capacity will beprovisioned out. Then there is also the administrative overhead generated by either provisioning newvolumes or extending existing ones. If new volumes are created it needs to be determined how largeeach volume will be, which server it will be assigned to and then modifications need to be made to thoseattaching servers so that they can mount the new volumes. If volumes are to be extended there may bedown time associated with that process as well.This provisioning process takes time if itʼs to be done accurately, and if accuracy is sacrificed for speed,even more inefficiency creeps into the environment. Most data centers report that it takes typically a weekto a month to provision new storage once it has been received on the loading dock. This is time that theusers of these applications simply may not have and results in either the above stated inaccuracy ormany late nights for the IT staff, as well as dissatisfaction on the part of users.7/19/2011 Page 2 of 7
  • 3. Storage Switzerland, LLCThe Snapshot Ripple EffectThere is also the impact of clones which leverage snapshots. Formerly a tool exclusively just for dataprotection, clones are now the production use of those snapshots which are used to reduce the capacitydeployment requirements of new volumes by making snapshot copies writable. While snapshots arespace efficient by design the more of them that are in place and the further away from the original theyare the more growth occurs.In a VMware example, a snapshot may be used to move a virtual machine to a prior state where a clonewill be used to build additional VMs using the base image as a master, or ʻgoldenʼ image. Both are spaceefficient but do incur growth as the snapshot ages or the clone is personalized and changed from themaster. Multiply these small growth additions across dozens or even hundreds of VMs and there is thepotential for loss in storage efficiency. Also, these are net new changes so other data efficiencytechniques, like deduplication, wonʼt be effective against this growth.Increasing Primary Storage Efficiency with IBM Real-time CompressionAs stated above the problem is not just that more primary storage capacity has to be bought and paid for,itʼs also that adding this primary storage adds costs in time, space and other resources. The answer maybe to increase primary storage optimization through the use of storage efficiency technologies like IBMReal-time Compression. Potentially the best place to improve this efficiency is upfront, at its source, asdata is being written to and from primary storage. This is what real-time compression does.IBM Real-time Compression optimizes data before itʼs ever stored on the hard disk, providing up to a 5xspace reduction. To do this requires that the device performing the optimization be placed inline betweenthe servers and the storage. As data goes through the IBM Real-time Compression Appliance itʼscompressed and then sent to network attached storage (NAS) device.Compressing a data stream increases the ʻeffective throughputʼ, as the same amount of information iscontained in less data, and less physical space. This means that all the components of the storagesystem have to handle less data and as a result, instantly perform better and more efficiently. Thebandwidth between the device and the storage and between the storage system and the shelvesincreases. The effective capacity of the cache in the storage increases, and even the efficiency of thedrives improves, since more data can be collected on each rotation of the drive platters. The result is thateven though the optimization device is compressing all data inline, it does so without performance impact,in most cases actually delivering a performance improvement.Finally, there is also the obvious gain in storage capacity utilization. As stated above, the demand forincreased capacity, especially when you factor in the cost of power and cooling, may be out pacing theexpected cost reductions of that capacity. This means that, even from a hard cost basis, itʼs no longer“cheaper” to buy more storage than to invest in efficiency.There is also an efficiency effect in other primary storage functions like the overhead associated withRAID parity and the extra space required by clones. With IBM Real-time Compression the capacityrequired to store cloned volumes is reduced by up to 80%. This means that clones can typically bemaintained for a longer period of time since they will require less disk capacity. It also means that updatesto the clones, driven by changes, will occur faster since less actual data has to be modified.Inline compression solutions, like IBMs Real-time Compression Appliances, provide what could bepotentially the fastest and simplest way to increase storage efficiencies. For example, using the 5xcompression ratio, a 100TB storage system containing 100TB of data could be reduced to 20TB.7/19/2011 Page 3 of 7
  • 4. Smarter Primary Storage Through Real-time CompressionBesides creating another 80% of ʻinstant capacityʼ, using real-time compression also reduces by 80% theamount of data handled through every process in the data stream. This results in lower power, coolingand floor space consumption, as well as less time and energy spent on detailed implementation andprovisioning plans by storage administrators. By storing more information in the same data space lessprovisioning work has to occur, making IT staff more efficient.Imagine a 100TB system that was 100% full now only being 20% full. That means 80TB of additionalgrowth before a new storage system needs to be implemented. That leads then to an efficiency gain in allthe other areas mentioned. No more additional floor space, power and cooling needs to be consumed nordo storage administrators need to spend time working up detailed implementation and provisioning plans.Smarter Data ProtectionData growth is also occurring in the data protection process as well as with primary storage. Whileapplying IBM Real-time Compression at the primary storage level helps with capacity growth, real-timecompression brings its own unique value in storing protected data, and similar to primary storage, has itsown ʻripple effectʼ in other areas of the infrastructure. The growth in data protection storage is not onlycaused by the growth of primary storage, but also by growth in the number of redundant instances of datanow found in the data protection process.For example, a snapshot of data is often taken on primary storage and then replicated to a secondarylocation for disaster preparedness. The primary data is then backed up by traditional backup applicationsdaily and weekly to a disk storage area. Then, the backup jobs themselves are often replicated by a diskbackup appliance to a DR location which is finally copied to tape drives. While each of these processesmay have its own optimization capabilities, data has to be ʻre-inflatedʼ or ʻde-optimizedʼ before it can bemoved between process and storage types. IBM Real-time Compression can improve the efficiencies ofthe individual optimization steps that may exist for each of the processes and make the transport betweenthem more effective.The Data Protection Ripple EffectBackup Software Ripple EffectToday in the enterprise most backups are network-based, meaning that all the data has to be movedacross the network to the backup server. While slower, it is significantly more cost effective than direct-attached or fibre-attached storage. When compared to WAN replication, local area network bandwidthavailable may seem huge, but itʼs not so when considering that most applications donʼt have the ability tobackup only changed blocks. While some have the intelligence to perform incremental backup (changedfiles only) and then merge them, most have to back up the entire data setSnapshot GrowthDepending on the storage system or operating system, when theyʼre taken, snapshots typically setcurrent blocks of storage to read-only. Then, as users make changes to data that would affect theseblocks, a new block is stored to represent the active, up-to-date data. Snapshots then leverage this read-only collection of blocks to represent how the data looked at that point in time. When the snapshot hasexpired the read-only blocks that changed are released and returned to the storage pool, available to bewritten over. As a result snapshots are space-efficient, the only growth occurring when a block is added ormodified. But the longer snapshots are held and the more of them that are taken, the more space that has7/19/2011 Page 4 of 7
  • 5. Storage Switzerland, LLCto be reserved for their use. With most systems as the reserve areas begin to run out of space, snapshotcompletion times grow longer and will cease altogether if there is no available reserve space.Replication GrowthFor most systems snapshots are also the core technology for an off-site replication feature. They leveragethe same changed block tracking technique to know which blocks should be sent across the wide areanetwork (WAN) to update the storage system at the DR site. Again, while snapshots may be space-efficient in an active system, when consuming bandwidth over slow WAN lines, multiple snapshotscombined with a low WAN transfer speed can become a significant bottleneck to the process becauseeven the small growth of snapshots is more than many WAN segments can maintain. This causes the twosystems to be out of sync for an increasing amount of time; and in some cases, they may never catch up.There is also the issue with the DR site, where storage often must be a mirror image, in size and capacity,of the local storage system. This means that when capacity is added to the primary system it must also beadded to the DR system.The Weaknesses of Deduplication OnlyIn many environments local backups now use a combination of disk and tape to store data. While diskdoes improve performance the most popular disk based solutions are attached via the IP network, as aremost of the servers being backed up. These disk- based backup solutions have almost all addeddeduplication to improve backup efficiency, but most do so after the data has been sent across the LANand received by the backup device.A second weak point in deduplication is that it only works if there is redundant data available; net newdata typically wonʼt deduplicate well. As a result disk backup deduplication systems are inefficient whentransferring data between their own devices and could stand to be more efficient when storing data totheir devices as well.The Remote Vaulting Ripple EffectRemote backups are becoming an increasingly popular method to electronically move data off-site,instead of copying data to tape and shipping the cartridges to a vault. The requirements of remote backupare similar when compared to replication mentioned above. The ability to optimize the WAN connectionbetween the two locations is important as is the ability to reduce the storage footprint of data at theremote location. Again, deduplication only works on redundant data and in most cases requires identicaldevices.The Restoration Ripple EffectThe final area to consider is the impact on restoration from all these different devices. Snapshots haveprobably the least restoration impact since they can be directly mounted and in some cases directly usedby the application. Their challenge is the length of time theyʼre retained and the fact theyʼre typically onlysuitable for recovering the most recent copy of data. As discussed above if a further point in time isrequired another backup storage source must be used. The problem with these remaining storagesources is the speed at which they can deliver data. Theyʼre not only constrained by the network theyhappen to be on but are also inhibited by the storage optimization schemes that they use. Deduplicateddata must be reassembled on the fly as itʼs restored to the recovery location, something which adverselyimpacts performance in almost every case.7/19/2011 Page 5 of 7
  • 6. Smarter Primary Storage Through Real-time CompressionIncreasing Data Protection Efficiency with IBM Real-time CompressionIBM Real-time Compression is an ideal solution for most of these data protection challenges and byimplementing it along side of the existing solutions it can increase efficiencies across every aspect of thedata protection process, not just in storage capacity. The key uniqueness of IBM Real-Time Compressionis that, unlike other capacity optimization solutions, it has the ability to keep data in an optimized statethroughout the data protection workflow. In a data protection deployment, the IBM Real-time CompressionAppliance sits in front of the primary storage. Any data that is "behind" the appliance will stay in acompressed state and gain the efficiencies of IBM Real-time Compression, typical up to 5x compressionrates.Impact of more efficient local backupsIBM Real-time Compression adds the ability to improve backup software network performance by keepingthe data in a compressed state as it moves from primary storage to backup storage. In addition to theperformance gains on the network and the capacity gains in backup storage, the backup server itself alsosees a performance improvement. It has less data to handle, that data is already in a compressed formatand it doesnʼt have to wait as long to confirm that the data has been written to backup storage. The resultshould be that real-time data compression not only makes the backup process faster but is able to extendthe useful life of the backup server itself.Impact of more efficient snapshotsAnother area of positive impact is with snapshots and clones. Snapshots are references to previous viewsof data at certain points in time and are typically used for data protection and recovery. A snapshot maybe used to move a volume or virtual machine to a prior state. Like clones they are space-efficient, but doincur growth as the snapshot ages. With IBM Real-time Compression the capacity required to storesnapshotted or cloned volumes is reduced by up to 80%. This means that snapshots can typically bemaintained for a longer period of time since they will require less disk capacity.Impact of more efficient WAN replicationDisaster Recovery (DR) capability is an important item on every IT agenda, with the foundation of a DRplan to make sure that data is available off-site. This is typically done by a WAN replication processavailable within the storage system. Despite the intelligence of block replication, the speed limitations ofthe typical WAN can impact how "in-sync" the remote location is. In busy environments, with lots ofchanged data and a slow WAN connection, the DR site could be dozens of minutes, or even hours, out ofsync with the primary location. Leveraging IBM Real-time Compression once again reduces the typicaldata size by up to 5x, effectively providing 5 times more bandwidth, and helping to make a DR site thatʼs30 minutes out of sync become 100% in-sync.Impact of more efficient remote backupsRemote vaulting of backup data is becoming increasingly popular compared with the older ʻtape andtruckʼ method. It has been enabled in a large part within data deduplication devices, which only replicatenet new blocks of data that they receive on the device, similar to snapshots. They create an importantsecond tier in DR strategies, after the replication of primary volumes described above. The replicationprocess that these deduplication systems use is limited to the available bandwidth on the WAN and once7/19/2011 Page 6 of 7
  • 7. Storage Switzerland, LLCagain, is helped by real-time compression. In fact, even in cases where the deduplication appliance canperform compression, products like IBM Real-time Compression make that process more efficient.Real-time compression should not be viewed as a competitor to traditional deduplication, but as acomplement. It improves transfer performance across the network to the deduplication device, improvesthe storage efficiency of the device, improves the deduplication analysis performance of the device andimproves the WAN replication capabilities.Impact of more efficient restoresThe real benefit of all of the efficiencies that IBM Real-time Compression brings to the backup process isin the restore process. Compared with traditional backup applications, the speed increase with which datacan be located and pulled from the backup device and sent through the backup server to its finaldestination is significant. Itʼs not atypical for restore jobs to take hours and dividing that restore time by 5can make the difference between an application being back online in one hour instead of five.Product Analysis - What Is The IBM Real-time Compression Appliance?IBM Real-time compression is a technology that is hosted on an appliance, available in two models thatare sized for the appropriate workload that a data center needs to be optimized. The STN6500 is ready todeploy into 1 Gb Ethernet NAS environments and has 16 ports, supporting 8 connections between NASsystems and network switches. The STN6800 can be customized with multiple 10 Gb and 1 Gb Ethernetports to support high throughput requirements. It can have up to four 10GbE connections for highthroughput environments or can mix two 10GbE and four 1GbE connections for greater flexibility.Implementation of the IBM Real-time Compression is straight forward. Effectively the appliances sitbetween the network switches and the storage devices or NAS heads. As the name implies compressionhappens inline prior to the data being stored on the NAS disk. There are no changes required to thestorage systems or shared volumes that the NAS is hosting. Once activated, as data is written to the NASsystems, the path is now through the IBM Real-time Compression Appliances.With inline compression in place the data center will come to count on its space savings and efficiencygains. It will become important for some environments that the IBM Real-time Compression Appliancessupport high availability environments. For those situations there is the ability to add a second unit andleverage automated failover between those units.While it seems that adding the extra step of compression should degrade performance the opposite isactually the case. The expense required to compress the data via these dedicated appliances is less thanthe gain seen by the storage infrastructure because it now has to deal with less data.SummaryThe effect of the unprecedented data growth seen by data centers is causing more than just storagebudgeting problems for IT Managers. It is costing efficiency of employees and processes. With IBMʼsReal-time Compression Appliances data centers can re-gain control over their data problems. With itsimplementation they will see a ripple effect as the benefits of increased efficiencies within the productionand data protection processes spread from primary storage throughout the infrastructure. The net resultwill be an improvement in the productivity of personnel and the ability of those processes to keep up withuser demands.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"#$%&%()*#+,)%%!!!7/19/2011 Page 7 of 7