Data Reduction Technologies Impact the Cloud: What You Need to Know


Published on

This presentation will outline the trends and forces driving the need for using the cloud as a tier in data protection in virtualization environments. We’ll discuss the data protection partnership required from the source of data, virtual machine disks and files inside of virtual machines, all the way to the cloud in order to successfully achieve data protection goals. This will include examples on how content-specific data reduction techniques virtualized disks can provide immediate value as the cloud becomes the next great tier in the data center storage hierarchy. Finally, we will examine the challenges as data reduction become more valuable as wire-usage optimization than as a disk-usage optimization.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data Reduction Technologies Impact the Cloud: What You Need to Know

  1. 1. Cloudy With a Chance of DataReduction: How Data ReductionTechnologies Impact the CloudMitch HaileTechnical Director – VirtualizationOctober 2011
  2. 2. Overview The World Has Changed Life of a File On the Wire2
  3. 3. The World Has Changed
  4. 4. Unstructured FILE Data Growth Over-committing storage infrastructure & budget By 2015, 80% of file data in virtualized server environments (by capacity) Management nightmare Sources: ESG Digital Archive Study, June 2010; IDC, Virtualized Environments, August, 2010; NetApp Letter to Stockholders, Nov 2009
  5. 5. Structured Data Virtual Disks5
  6. 6. Structured Data Virtual Disks Databases Email Movies6
  7. 7. Data Usage Hot C o D m Warm p R l i a n c e7
  8. 8. Cheaper Storage 1 Petabyte retail = $39,9958
  9. 9. MTBF Size per MTBF spindle • SATA disk: look at errors/disk capacity • 2 TB disk → 16% chance of hard error • 4 TB disk → 32%chance of hard error • ~2015, disks expected to be 8 – 16 TB → 100% chance of error during whole disk read Source: Steve Hetzler, IBM FAST’119
  10. 10. Power TCO Cost of powering 1 drive = from ~$14 to ~$33 per year Cost of cooling 1 drive = ~$33 per year If your kwH cost is ~13¢… Sources: You can save~$28,000/Petabyte/Year on just power/cooling10
  11. 11. “If I can do it at home …”
  12. 12. The Cost of Moving Data - Tolls - Throttled12
  13. 13. Data Volume: Cloud13
  14. 14. Perfect Storm
  15. 15. Observations Lots of copies of data you will never use again Economies of scale will help drive cloud adoption Optimizing the wire will dominate Optimizing the wire requires data reduction15
  16. 16. LIFE OF A FILE
  17. 17. Old Life of a File When On Backup Monthly Where Source LAN LAN Until deleted One When Forever or changed Week or expired17
  18. 18. Life of a File Today On This End of When Write Or Tonight weekend Month Time Or … Where Source LAN WAN Cloud Cloud Until deleted One One Six Seven How Long or changed Week Month Months Years or expired18
  19. 19. Life of a File.219
  20. 20. Life of A File: Data ReductionEncryption,Compression,Filters, Data-specific Data reduction starts Deduplication, here; Otherwise Data Pipeline, opportunities are lost. Data Specific Where Source LAN WAN Cloud Cloud20
  21. 21. EmailSplit the Data Fixed Reduction LAN, WAN, Cloud Meta-Data Everywhere Plain Copy LAN Variable Reduction WAN, Cloud21
  22. 22. Virtual DisksSplit the Data Fixed Reduction Source swap Temp Temp Temp File Plain Copy Source File File File Variable Reduction Source, LAN, Cloud File File File Variable Reduction Cloud Meta-Data Everywhere22
  23. 23. ON THE WIRE
  24. 24. Data Reduction and the Cloud For the front-end: – Bandwidth reduction value – Latency improvement – Overall cost because of billing model For Back-end: – Storage footprint – Data is copied around a lot – bandwidth saving But there are issues: – Need to homogeneous solutions – Must be woven in DP and DM activities – Must be automatic – sizing can be big issue
  25. 25. Dedupe-Aware Transfers Dedupe-aware transfers disruptively reduce the amount of data passed between the tiers – Dedupe negotiation protocols are required – These protocols can be layered over compression – These protocols must efficiently handle two cases:  Data being pushed from edge into repository  Data being replicated between repositories Primary Data Center Efficient Remote Efficient Data DR Facility Tape Library Data Transfer Replication  Migration Storage  Integration Database Systems  Encryption Mail  Indexing  Self-healing Files Virtual
  26. 26. Data Reduction “On-the-Wire” Multiple considerations when moving data over-the-wire: – Is data being moved between a data-reduced repo and traditional “raw” system – Is data being moved between two systems with same reduction technology – Can multiple data reduction technologies be employed at each stage of movement Mixing file and block level solutions is problematic – often, mixing NAS and VTL demonstrate similar problems What media must the data be moved over: high-latency or low-latency? Each data reduction scheme has benefits and downsides in each of above scenarios There is no “free lunch” – somehow, somewhere you have to dehydrate and re-hydrate the data!
  27. 27. Deduplication On-the-Wire Most dedupe vendors offer dedupe-enabled replication , buts there is a lot of variance Most are somewhat complex forms of a simple model – Client batch up a group of sequential chunk fingerprints – Client send batch to smart target that can query existence of each fingerprint – Target sends back results and client pushes unique data Above scheme only works when client/server both can form identical chunks and fingerprints Collaborative dedupe schemes are less common; these schemes provide a method that allows client to chunk and fingerprint data to enable the negotiation Collaborative schemes don’t work over the old legacy protocols (NAS); that’s starting to change (OST/XAM/pNFS)
  28. 28. Deduplication On-the-Wire Benefits and cost are more subtle: – Most dedupe solutions send file/object level hash of hashes to prune copies similar to SI technologies – Some solutions provide hierarchical hash-of-hashes to obviate the transfer of large ranges – Most solutions can negotiate individual chunks – For solutions that negotiate all (or most) chunks, a large number of hash negotiations:  Excellent results when most actual data transfer is obviated  Results can add to transfer overhead when dedupe ratios are low  Cost of hash negotiations serializes data transfers; this can be invisible on low- latency wires but cause significant slow downs on high-latency wires
  29. 29. Data Reduction EARLY Days  Weeks  Months  Years Division/Remote Primary Data Center DR Facility Archive Centralized Management Midrange Storage Solution Open Replication Files Virtual Open Replication Open Database Replication Mail Core Open Replication Server-based Solution Enterprise Storage Solution Managed Service Solution
  30. 30. Other Details Federation Policy Data Movement
  31. 31. Conclusion Forces are driving us to Cloud more quickly It’s all about the wire Ubiquitous data reduction31
  32. 32. YOUR YEAR-ROUND IT RESOURCE – access to everything you’ll need to know
  33. 33. THE WHOLETECHNOLOGY STACKfrom start to finish
  34. 34. COMMENT & ANALYSISInsights, interviews and the latest thinking on technology solutions
  35. 35. VIDEOYour source of live information – all the presentations from our live events
  36. 36. TECHNOLOGY LIBRARY Over 3,000 whitepapers,case studies, product overviews and press releases from all the leading IT vendors
  37. 37. EVENTS, WEBINARS & PRESENTATIONS Missed the event? Download the presentations thatinterest you. Catch up with convenient webinars. Plan your next visit.
  38. 38. DirectoryA comprehensive A-Z listing providing in-depth company overviews
  39. 39. ALL FREE TO ACCESS 24/7
  40. 40.