SlideShare a Scribd company logo
1 of 48
Download to read offline
sSE20
   Data Footprint Reduction:
   Understanding IBM Storage
   Efficiency Options
   Tony Pearson
   Master Inventor and Senior Managing Consultant, IBM Corp


   Sanjay S Bhikot
   Advisory Unix and Storage Administrator, Ricoh Americas Corp




#IBMEDGE          © 2012 IBM Corporation
Data Footprint Reduction is the
    catch-all term for a variety of
    technologies designed to help
    reduce storage costs. This session
    will cover thin provisioning, space-
    efficient copies, deduplication and
    compression technologies, and
    describe the IBM storage products
    that provide these
    capabilities.




#IBMEDGE   © 2012 IBM Corporation
Sessions -- Tony Pearson
   • Monday
           – 1:00pm           Storing Archive Data for Compliance Challenges
           – 4:15pm           IBM Watson: What it Means for Society
   • Tuesday
           – 4:15pm           Using Social Media: Birds of a Feather (BOF)
   • Wednesday
           – 9:00am           Data Footprint Reduction: IBM Storage options
           – 2:30pm           IBM's Storage Strategy in the Smarter Computing era
           – 4:15pm           IBM SONAS and the Cloud Storage Taxonomy
   • Thursday
           – 9:00am IBM Watson: What it Means for Society
           – 10:30am Tivoli Storage Productivity Center Overview
           – 5:30pm IBM Edge “Free for All” hosted by Scott Drummond


                                                                                    3
#IBMEDGE       © 2012 IBM Corporation
Agenda

            •     Thin Provisioning
            •     Space-Efficient Copy
            •     Data Deduplication
            •     Compression




#IBMEDGE    © 2012 IBM Corporation
History of Thin Provisioning

 The StorageTek
 Iceberg 9200 Array
 Introduced Thin                              1997           Today
 Provisioning on
 slower 7200RPM
 drives for mainframe
 systems                                                       Thin Provisioning is
                                                                available for many
                                                                operating systems
       1994                                                        on IBM storage,
                                                                including DS8000,
                                        IBM resold this as     XIV, SVC, N series,
                                       the RAMAC Virtual          Storwize V7000,
                                         Array (RVA) for              DS3500 and
                                       mainframe servers                 DCS3700


                                                                                      5
#IBMEDGE      © 2012 IBM Corporation
Why Space is Over-Allocated
• Scenario 1                           • Scenario 2
     – Space requirements                 – Space requirements
       under-estimated                      over-estimated
     – Running out of space               – Capacity lasts for years
       requires larger volume                • No data migration
     – New request may take                  • No application outages
       weeks to accommodate                  • No penalties
           • Application outage if
             not addressed in time
     – Data must be moved to                 When faced with this dilemma,
       the larger volume                      most will err on the side of
                                                   over-estimating
           • Application outage
             during data movement

                                                                             6
#IBMEDGE      © 2012 IBM Corporation
Fully Allocated vs. Thin Provisioned
                                      Allocated but unused space
                                      dedicated to this host,
                                      wasted until written to
    Host sees fully
    allocated amount                  Actual data written




                                      Empty space available to others

                                      Physical Space Allocated
    Host sees full
    virtual amount                    Actual data written




                                                                        7
#IBMEDGE     © 2012 IBM Corporation
Fully Allocated vs. Thin Provisioned


                                     Volume/LUN – one or more extents

Host sees a volume
or LUN that consists                 Extent – Allocation Unit
of blocks numbered                   One or more grains
0 to nnnnnnnnnn

                                     Grain – range of 1 or more blocks



                                     Block – typically 512 or 4096 bytes




                                                                           8
#IBMEDGE    © 2012 IBM Corporation
Coarse and Fine-Grain

9                                          Block 00, 55, and 99 written
8                                              Fully Allocated, all 10 extents allocated
                                               Coarse-Grain, only 3 extents allocated
7                                              Fine-Grain, only 1 extent allocated
6
5
                                                                        Grain 00-01
4                                           Grain 90-99 = extent
3                                                                             Grain 54-55

2                                      9                                            Grain 98-99
1                                      5
0                                      0
    0 1 2 3 4 5 6 7 8 9                    0 1 2 3 4 5 6 7 8 9
     Fully Allocated                           Coarse-Grain                Fine-Grain
                                                                                                  9
#IBMEDGE      © 2012 IBM Corporation
How IBM has implemented TP

                        IBM DS8000   IBM XIV   SVC and     DS3500,
                                               Storwize    DCS3700
                                               V7000
    Type                Coarse       Fine      Fine        Fine

    Allocation          1 GB         17 GB     16MB to     4 GB
    Unit                                       8GB

    Grain size                       1 MB      32-256 KB   64 KB




                                                                     10
#IBMEDGE    © 2012 IBM Corporation
Thick-to-Thin Migration


                                                    Volume
            Fully-allocated                          mirror              Thin-
            volume
                                                                  provisioned
                                                                      volume
                                      Copy 0          Copy 1




                                    Only non-zero blocks copied

                                                                                 11
#IBMEDGE   © 2012 IBM Corporation
Empty Space Reclaim

                                    Thin Provisioning, allocations in 17GB units, with
                                    1MB chunks (grains). Only non-zero blocks consume
                                    physical space.

                                    Avoid writing empty blocks, any I/O request that
                                    tries to write a block of all zeros to unallocated space
                                    is ignored.

                                    Background task to find empty chunks, a
                                    background task scans all blocks, looking for chunks
                                    containing all zeros.

                                    Empty space reclaimed empty chunks are
                                    returned to unallocated space, so that it can be used
                                    for other volumes


                                                                                               12
#IBMEDGE   © 2012 IBM Corporation

                                      *** IBM Confidential until July 12, 2011 ***
Thin Provisioning
  Pros                                  • Cons
           Just-in-Time increased           Not all file systems
           utilization percentage           cooperate or friendly
           Eliminates the pressure to            Deletion of files does not
           make accurate space                   free space for others
           estimates                             “sdelete” writes zeros over
                                                 deleted file space
           Dynamically expand
           volume without impacting         Some implementations may
           applications or rebooting        impact I/O performance
           server                           May not support same set
           Reduces the data footprint       of features, copy services,
           and lowers costs                 or replication
           Shifts focus from volumes        “Writing checks you can’t
           to storage pool capacity         cash”

                                                                               13
#IBMEDGE       © 2012 IBM Corporation
Agenda

            •     Thin Provisioning
            •     Space-Efficient Copy
            •     Data Deduplication
            •     Compression




#IBMEDGE    © 2012 IBM Corporation
History of Space-Efficient Copies



                                             1997          Today



NetApp introduces
                                                              Space-Efficient Copy
Snapshot in its
                                                              is available on many
WAFL file system
                                                             IBM storage systems,
       1993                                                including DS8000, XIV,
                                                                    SVC, N series,
                                         IBM Enterprise           Storwize V7000,
                                         Storage Server      DS3500, DS5000 and
                                        (ESS) introduces                  DCS3700
                                       NOCOPY parameter
                                          on FlashCopy

                                                                                     15
#IBMEDGE      © 2012 IBM Corporation
Space-Efficient Copies
                                                           300 GB


    Source
                                                       Traditional Copies




                                      Destination 1     Destination 2       Destination 3


 100 GB allocated
  40 GB written                                Space-Efficient Copies. 10% reserved




                                                           30 GB
                                                                                            16
#IBMEDGE     © 2012 IBM Corporation
Method 1: Copy on Write (COW)
   Source    Destination                  • Copy-On-Write (COW)
                                            – Copy is set of pointers to
   Block A   B     C       D                  original data
                                            – Write to original volume:
                                               • Pause I/O
   Source    Destination                       • Copy original block of data to
                                                 destination
                                               • Update original block
   Block A   B      C2         D      C
                                            – Slows performance
                                            – May limit # of destination
                                              copies
                                            – Can be combined with
                                              background copy for a full
                                              copy
                                                                                  17
#IBMEDGE     © 2012 IBM Corporation
Method 2: Redirect on Write (ROW)

   Source    Destination                   • Redirect-On-Write (ROW)
                                             – Copy is set of pointers to
   Block A   B     C       D                   original data
                                             – Write to original volume:
                                                • Re-directed to new empty
   Source    Destination                          space
                                                • Previous data left alone
   Block A   B      C       D         C2
                                             – Does not impact
                                               performance
                                             – Supports many destination
                                               copies


                                                                             18
#IBMEDGE     © 2012 IBM Corporation
Space-Efficient Copies
  Pros                                   • Cons
           Supports both                     Some implementations
           Fully-allocated and               may impact I/O
           Thin-Provisioned Sources          performance
           Reduces the data footprint        Requires that you
           and lowers costs                  estimate the maximum
           Allows you to keep more           percentage changed
           copies online                      • Typically 10-20 %
           Allows you to take copies         Exceeding the reserved
           more frequently                   space invalidates
              Can be used as                 destination copy
              checkpoint copies during
              batch processing



                                                                      19
#IBMEDGE       © 2012 IBM Corporation
Agenda

            •     Thin Provisioning
            •     Space-Efficient Copy
            •     Data Deduplication
            •     Compression




#IBMEDGE    © 2012 IBM Corporation
History of Data Deduplication


 Advanced Single                                                   Today
                                               2008
 Instance Store
 (A-SIS) bring
 deduplication for the
 IBM N series and                                                  IBM offers a variety of
 NetApp disk storage                                                   choices, including
                                                                   ProtecTIER, N series,
                                                                      and Tivoli Storage
       2007                                                           Manager (TSM v6)
                                       IBM acquires Diligent
                                         and introduces the
                                        ProtecTIER TS7600
                                       virtual tape library with
                                          data deduplication

                                                                                             21
#IBMEDGE      © 2012 IBM Corporation
Data Deduplication


 • Data deduplication reduces capacity requirements by
   only storing one unique instance of the data on disk
   and creating pointers for duplicate data elements




                                                          22
#IBMEDGE   © 2012 IBM Corporation
Deduplication reduces disk
required for backup copies




                                    23
#IBMEDGE   © 2012 IBM Corporation
23
Two Primary Data Deduplication
  Approaches




                Hash based           HyperFactor
               Deduplication
                                     A different approach
             Sometimes referred to   based on an agnostic
                as a Content             view of data
             Addressable Storage
                  approach




                                                            24
 #IBMEDGE   © 2012 IBM Corporation
24                                   31-May-12
Hash-Based Approach

       1. Slice data into chunks (fixed or variable)

               A                B      C   D           E


       2. Generate Hash per chunk and save
            Ah Bh Ch Dh Eh

       3. Slice next data into chunks and look for Hash Match

               A                B      C   D           E


       4. Reference data previously stored
                                                                25
 #IBMEDGE     © 2012 IBM Corporation
25                                         31-May-12
HyperFactor Approach

       1. Look through data for similarity

                                         New Data Stream


       2. Read elements that are most similar
       3. Diff reference with version – will use several elements

            Element A                 Element B            Element C


       4. Matches factored out – unique data added to repository


                                                                       26
 #IBMEDGE    © 2012 IBM Corporation
26                                                    31-May-12
Assessment of Hash-based
Approaches
Example: Imagine a chunk size • Applicable for all chunking
  of 8 KB                        methods
• 1 TB repository has
                               • Hash Table in Memory
  ~125,000,000 8 KB chunks
                                  – Overhead for in-band deduplication
• Each hash is 20 bytes long
                                  – Hash table will grow with data volume
• Need pointers scheme to
                                  – Growing hash-table may become
  reference 1 TB                    performance bottleneck
The hash-table requires 2.5 GB    – Scalability issues
  RAM
     » no issue                • Hash-Collisions must be handled
                                    • Hash table must be protected
With a 100 TB repository              – One copy might not be sufficient
   » ~250 GB of RAM is
     required


                                                                           27
#IBMEDGE   © 2012 IBM Corporation
When Deduplication Occurs
1. In-line Processing
     –     As data is received by the target device it is
           • Deduplicated in real time
           • Only unique data stored on disk
     –     Data written to the disk storage is deduplicated


2. Post-Processing
     –     As data is received by the target device it is
           • Temporarily stored on disk storage
     –     Data is subsequently read back in to be processed by a
           deduplication engine



                                                                    28
#IBMEDGE      © 2012 IBM Corporation
Comparison of Offerings

                      Hash-based             HyperFactor

    In-line           Other vendors          IBM ProtecTIER
    Process                                    –TS7680G
                                               –TS7650G
                                               –TS7650
                                               –TS7620 Express
                                               –TS7610 Express
    Post-             • IBM Tivoli Storage
    Process             Manager (TSM)
                      • N series

                                                                 29
#IBMEDGE   © 2012 IBM Corporation
IBM ProtecTIER with HyperFactor

                                    • Gateways
                                      – Attaches up to 1PB of disk
                                      – Two models:
                                         • TS7680 for IBM System z
                                         • TS7650G for distributed systems


                                    • Appliances
                                      – Disk included inside
                                      – Three models for distributed
                                        systems
                                         • TS7650 … in three sizes
                                         • TS7620 (New!)
                                         • TS7610 ... in two sizes
                                                                             30
#IBMEDGE   © 2012 IBM Corporation
ProtecTIER vs.
Tivoli Storage Manager
 Both Solutions Offer the Benefits of Target side Deduplication:
  –   Greatly reduced storage capacity requirements
  –   Lower operational costs, energy usage and TCO                  Complementary
                                                                     Solutions Today!
  –   Faster recoveries with more data on disk                  Can be used together but don’t
                                                                deduplicate the same data twice
 Use ProtecTIER When:
  –   Highest performance and capacity scaling are required!
  –   Up to 1400 MB/sec (2.5GB/s with 2 node) deduplication rates are needed
  –   Deduplicated capacities up to 25 PB are required                            IBM TS7600
  –   You wish to avoid operational impact of post processing deduplication
  –   A VTL appliance model is desired
  –   Deduplicating across multiple TSM (or other backup) servers

 Use TSM 6 Built-in Deduplication When:
  –   You desire deduplication operations be completely integrated within TSM
  –   The benefits of deduplication are desired without separate hardware or
      software dependencies or licenses (ships with TSM Extended Edition)
  –   You desire end to end data lifecycle management with minimized data             TSM
      store
                                                                                                  31
#IBMEDGE       © 2012 IBM Corporation
Data Deduplication
  Pros                                   • Cons
           Designed for backups              Dealing with Hash
           Can offer up to 25x data          Collisions
           footprint reduction                • May require byte-for-byte
            • Allows disk backup                comparisons or keeping
              repositories to approach          secondary copy of data
              cost of tape-based             Some systems do not scale
              solutions                      Some systems have slow
           Allows more backup                restores
           copies to remain on disk           • Re-hydrating data back to
           for faster restores                  normal
           Available with a variety of       Primary data may not
           interfaces, including VTL,        dedupe very well
           OST and NAS                        • Your mileage may vary!


                                                                            32
#IBMEDGE       © 2012 IBM Corporation
Agenda

            •     Thin Provisioning
            •     Space-Efficient Copy
            •     Data Deduplication
            •     Compression




#IBMEDGE    © 2012 IBM Corporation
History of Compression
                                                      Today


                                       1986
NASA and IBM developed                                                       IBM offers
the Houston Aerospace                                          real-time compression
Spooling Protocol (HASP)                                        for file and block level
with compression for long                                      access to disk storage
distance data transmission.

       1973
                                         IBM introduced the
                                           Improved Data
                                        Recording Capability
                                        (IDRC) for the 3480
                                             tape drive


                                                                                           34
#IBMEDGE      © 2012 IBM Corporation
Lossy vs. Lossless Methods

                     Compress
                                                                  Compress




                                            Decompress
  Decompress
                                            returns data
  does not return
                                            back to its          Exactly
  data back to its                Good
                                            original contents   the same
  original contents              enough?

• Lossy                                    • Lossless
     – Used with music, photos, video,         – Used with databases,
       medical images, scanned                   emails, spreadsheets, office
       documents,                                documents, source code
       fax machines
                                                                                35
#IBMEDGE    © 2012 IBM Corporation
How Compression Works




     • Lempel-Ziv lossless compression builds a dictionary of repeated
       phrases, sequences of two or more characters that can be
       represented with fewer number of bits
     • In the above excerpt from “Lord of the Rings”, all of the red text
       represents repeated sequences eligible for compression!

Source: The Lempel Ziv Algorithm, Christian Zeeh, 2003
                                                                            36
#IBMEDGE             © 2012 IBM Corporation
Compressed Volumes
Allocated but unused space
dedicated to this host,
wasted until written to                       Physical Space
                                              Allocated

Actual data written                           Actual data written




 Host sees full
 virtual amount
                                       Physical Space
                                       Allocated, up to 80%
                             Actual    reduction from actual
                             data      data written
                             written



                                                                37
#IBMEDGE    © 2012 IBM Corporation
Real-time Compression!
 Workstations                            • Real-time Compression for primary data
                     IP                     – Less data stored on primary storage (up to 80%)
                  Network
                                            – No changes to applications or procedures

 Application
  Servers
                                         • Before it gets to the storage array
                                            – Larger effective storage cache
                                            – Disk Array can serve more requests from its read /
                                              write cache
                                            – Lower storage CPU overhead


                Cache     Cache
                                         • Does not cause performance degradation
                                            – Much smaller I/O / lower disk workload
                                            – Reads/Writes are faster due to storage array’s
                                              response from cache instead of disk
                                            – Additionally reads may come from advanced read
                                              ahead cache (no write cache)
                 Disk Array


                                                                                                   38
 #IBMEDGE       © 2012 IBM Corporation
38
FIVO vs. VIFO
                                       Compressed                          Compressed
           Data                                               Data
                                          Data                                Data
  1                                        1              1                  1

  2                                        2              2                  2

  3                                        3              3                  3

  4                                        4              4                  4

  5                                        5              5                  5

   6                                       6              6                  6


• Fixed Input, Variable Output                      • Variable Input, Fixed Output
       – WAN transmission                              – Random Access Compression
       – Sequential tape                                 Engine™ (RACE)
       – IBM Tivoli Storage                            – IBM Real-Time Compression
         Manager                                         Appliances
       – zip, tar, etc.                                – IBM SVC, Storwize V7000



                                                                                        39
#IBMEDGE          © 2012 IBM Corporation
Compression for Disk data
                Traditional Approaches                                                Real-time Compression


                     Compression after Modification                   File
                                                                                                       Compression after Modification
                                                             A         B        C
                            A B C                                                                            A B C
                                                             D         E        F
         File               D MN F                                                            File           D MN F
                                                             G         H        I
                            G H I                                                                            G H I
   New
Compressed                                                   ABC      DEF       GHI       New
                            ABC       DMN       FGH     I
   File                                                                                Compressed        ABC        DEF1       GHI   MN
                                              Blocks Shift                                File
                                                                   Compressed
                                                                      File                                  Identical Blocks
     •   Extra work to ‘edit’ a file
                                                                                        •   Small amount of work / I/O to edit
     •   All blocks shift
                – Only one common block                                                 •   Only modified block changes
                  (this example)                                                              – Multiple common blocks
                – Negative impact to deduplication                                            – Enhances deduplication

     •   No notion of data location                                                     •   Data location via map


                                                                                                                                        40
 #IBMEDGE                     © 2012 IBM Corporation
40
Compression Without Compromise
Expected Compression Ratios

                                                                 Up to 80%
            Databases

                                          Linux virtual OSes     Up to 70%
            Server
            Virtualization                Windows virtual OSes   Up to 55%

                                          Office 2003            Up to 75%
            Collaboration                 Office 2007 or later   Up to 25%

                                                                 Up to 75%
            CAD/CAM Engineering/Design




                                                                             41
 #IBMEDGE        © 2012 IBM Corporation
41
Objectives:
• Run over a block device
• Estimate:
     – Portion of non-zero blocks in the volume.
     – Compression rate of non-zero blocks with RTC.
Performance:
• Runs FAST! < 60 seconds, no matter what the volume size
    – Typical running time on a machine with multiple disks: < 20 seconds
• Give guarantees on the estimation: ~5% max error guarantee
    – Can improve guarantee with more running time


Method:
• Random sampling and compression throughout the volume
• Collect enough non-zero samples to gain desired confidence
        – More zero blocks                   slower (takes more time to find non-zero blocks)
• Mathematical analysis gives confidence guarantees

•    Note: we are estimating compression during migration of a volume into RTC (data at rest)



                                                                                                42
    #IBMEDGE        © 2012 IBM Corporation
IBM Real-Time Compression
• For NAS devices                       • For Block devices
     – IBM Real-Time                       – SAN Volume Controller
       Compliance Appliance                – Storwize V7000




    STN 6500                                SAN Volume Controller




    STN 6800                                Storwize V7000




#IBMEDGE       © 2012 IBM Corporation
Migrating to Compressed Disk


                                                    Volume
            Fully-allocated                          mirror       Compressed
            or Thin-provisioned
                                                                      volume
            volume

                                      Copy 0          Copy 1




                                    Only non-zero blocks copied

                                                                               44
#IBMEDGE   © 2012 IBM Corporation
Data Compression
  Pros                                  • Cons
           Can be used for data             Some implementations are
           transmission, tape and           post-process
           disk data                         • Stores uncompressed
           Can offer up to 80% data            data first, compress later
           footprint reduction              Some implementations
           Available as front-end           impact performance and/or
           appliance or integrated          consume substantial CPU
           into storage system              resources
           Can be                           Benefits vary by data type,
           “Dedupe-Friendly”                and whether applications
                                            do their own compression
                                            or encryption
                                             • Your mileage may vary


                                                                            45
#IBMEDGE       © 2012 IBM Corporation
Thank You!


    Session:    sSE20
    Presenters: Tony Pearson,
                Sanjay Bhikot


#IBMEDGE
Intel, the Intel logo, Xeon and Xeon Inside are trademarks or registered
trademarks of Intel Corporation in the U.S. and /or other countries.
Additional Resources

                                     Email:
                                     tpearson@us.ibm.com

                                     Twitter:
                                     http://twitter.com/az99Øtony

                                     Blog:
                                     http://ibm.co/brAeZØ

                                     Books:
                                     http://www.lulu.com/spotlight/99Ø_tony

                                     IBM Expert Network:
                                     http://www.slideshare.net/az99Øtony




                                                                              62
 #IBMEDGE   © 2012 IBM Corporation
62
Trademarks and disclaimers
© IBM Corporation 2012. All rights reserved.

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other
countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government
Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or
registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States,
other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL
is a registered trademark, and a registered community trademark of The Minister for the Cabinet Office, and is registered in the U.S. Patent and Trademark Office. UNIX is a
registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of
Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other contries, or both and is used under license
therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.

Other product and service names might be trademarks of IBM or other companies. Trademarks of International Business Machines Corporation in the United States, other
countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml.

Information is provided "AS IS" without warranty of any kind.

The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual
environmental costs and performance characteristics may vary by customer.

Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not
constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor
announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related
to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.

All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance,
function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to
communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will
experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and
the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated
here.

Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your
IBM representative or Business Partner for the most current pricing in your geography.

Photographs shown may be engineering prototypes. Changes may be incorporated in production models.

References in this document to IBM products or services do not imply that IBM intends to make them available in every country.




                                                                                                                                                                                   63
 #IBMEDGE                  © 2012 IBM Corporation

More Related Content

What's hot

S016826 cloud-storage-nola-v1710d
S016826 cloud-storage-nola-v1710dS016826 cloud-storage-nola-v1710d
S016826 cloud-storage-nola-v1710dTony Pearson
 
S016825 ibm-cos-nola-v1710d
S016825 ibm-cos-nola-v1710dS016825 ibm-cos-nola-v1710d
S016825 ibm-cos-nola-v1710dTony Pearson
 
S100296 data-footprint-orlando-v1804a
S100296 data-footprint-orlando-v1804aS100296 data-footprint-orlando-v1804a
S100296 data-footprint-orlando-v1804aTony Pearson
 
S104874 toe-pool-jburg-v1809e
S104874 toe-pool-jburg-v1809eS104874 toe-pool-jburg-v1809e
S104874 toe-pool-jburg-v1809eTony Pearson
 
IBM Storage Virtualization
IBM Storage VirtualizationIBM Storage Virtualization
IBM Storage VirtualizationIBM Danmark
 
S100299 ibm-cos-orlando-v1804c
S100299 ibm-cos-orlando-v1804cS100299 ibm-cos-orlando-v1804c
S100299 ibm-cos-orlando-v1804cTony Pearson
 
S104872 spectrum nas-one-day-jburg-v1809e
S104872 spectrum nas-one-day-jburg-v1809eS104872 spectrum nas-one-day-jburg-v1809e
S104872 spectrum nas-one-day-jburg-v1809eTony Pearson
 
Storwize v7000 & v7000 unified arrow
Storwize v7000 & v7000 unified arrowStorwize v7000 & v7000 unified arrow
Storwize v7000 & v7000 unified arrowArrow ECS UK
 
S016827 pendulum-swings-nola-v1710d
S016827 pendulum-swings-nola-v1710dS016827 pendulum-swings-nola-v1710d
S016827 pendulum-swings-nola-v1710dTony Pearson
 
Choosing the Right Storage for your Server Virtualization Environment
Choosing the Right Storage for your Server Virtualization EnvironmentChoosing the Right Storage for your Server Virtualization Environment
Choosing the Right Storage for your Server Virtualization EnvironmentTony Pearson
 
S014065 cloud-storage-orlando-v1705a
S014065 cloud-storage-orlando-v1705aS014065 cloud-storage-orlando-v1705a
S014065 cloud-storage-orlando-v1705aTony Pearson
 
S016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710bS016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710bTony Pearson
 
Storage Cloud and Spectrum presentation
Storage Cloud and Spectrum presentationStorage Cloud and Spectrum presentation
Storage Cloud and Spectrum presentationJoe Krotz
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Tony Pearson
 
Storage cloud and spectrum update February 2016
Storage cloud and spectrum update February 2016Storage cloud and spectrum update February 2016
Storage cloud and spectrum update February 2016Joe Krotz
 
S104875 nightmares-dreams-spectrum-control-jburg-v1809h
S104875 nightmares-dreams-spectrum-control-jburg-v1809hS104875 nightmares-dreams-spectrum-control-jburg-v1809h
S104875 nightmares-dreams-spectrum-control-jburg-v1809hTony Pearson
 
S104878 nvme-revolution-jburg-v1809b
S104878 nvme-revolution-jburg-v1809bS104878 nvme-revolution-jburg-v1809b
S104878 nvme-revolution-jburg-v1809bTony Pearson
 
S016389 ibm-cos-brazil-v1708b
S016389 ibm-cos-brazil-v1708bS016389 ibm-cos-brazil-v1708b
S016389 ibm-cos-brazil-v1708bTony Pearson
 
S100297 ilm-archive-orlando-v1804c
S100297 ilm-archive-orlando-v1804cS100297 ilm-archive-orlando-v1804c
S100297 ilm-archive-orlando-v1804cTony Pearson
 

What's hot (20)

S016826 cloud-storage-nola-v1710d
S016826 cloud-storage-nola-v1710dS016826 cloud-storage-nola-v1710d
S016826 cloud-storage-nola-v1710d
 
S016825 ibm-cos-nola-v1710d
S016825 ibm-cos-nola-v1710dS016825 ibm-cos-nola-v1710d
S016825 ibm-cos-nola-v1710d
 
S100296 data-footprint-orlando-v1804a
S100296 data-footprint-orlando-v1804aS100296 data-footprint-orlando-v1804a
S100296 data-footprint-orlando-v1804a
 
S104874 toe-pool-jburg-v1809e
S104874 toe-pool-jburg-v1809eS104874 toe-pool-jburg-v1809e
S104874 toe-pool-jburg-v1809e
 
IBM Storage Virtualization
IBM Storage VirtualizationIBM Storage Virtualization
IBM Storage Virtualization
 
S100299 ibm-cos-orlando-v1804c
S100299 ibm-cos-orlando-v1804cS100299 ibm-cos-orlando-v1804c
S100299 ibm-cos-orlando-v1804c
 
S104872 spectrum nas-one-day-jburg-v1809e
S104872 spectrum nas-one-day-jburg-v1809eS104872 spectrum nas-one-day-jburg-v1809e
S104872 spectrum nas-one-day-jburg-v1809e
 
Storwize v7000 & v7000 unified arrow
Storwize v7000 & v7000 unified arrowStorwize v7000 & v7000 unified arrow
Storwize v7000 & v7000 unified arrow
 
S016827 pendulum-swings-nola-v1710d
S016827 pendulum-swings-nola-v1710dS016827 pendulum-swings-nola-v1710d
S016827 pendulum-swings-nola-v1710d
 
Choosing the Right Storage for your Server Virtualization Environment
Choosing the Right Storage for your Server Virtualization EnvironmentChoosing the Right Storage for your Server Virtualization Environment
Choosing the Right Storage for your Server Virtualization Environment
 
S014065 cloud-storage-orlando-v1705a
S014065 cloud-storage-orlando-v1705aS014065 cloud-storage-orlando-v1705a
S014065 cloud-storage-orlando-v1705a
 
S016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710bS016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710b
 
Storage Cloud and Spectrum presentation
Storage Cloud and Spectrum presentationStorage Cloud and Spectrum presentation
Storage Cloud and Spectrum presentation
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?
 
IBM Storwize V7000
IBM Storwize V7000IBM Storwize V7000
IBM Storwize V7000
 
Storage cloud and spectrum update February 2016
Storage cloud and spectrum update February 2016Storage cloud and spectrum update February 2016
Storage cloud and spectrum update February 2016
 
S104875 nightmares-dreams-spectrum-control-jburg-v1809h
S104875 nightmares-dreams-spectrum-control-jburg-v1809hS104875 nightmares-dreams-spectrum-control-jburg-v1809h
S104875 nightmares-dreams-spectrum-control-jburg-v1809h
 
S104878 nvme-revolution-jburg-v1809b
S104878 nvme-revolution-jburg-v1809bS104878 nvme-revolution-jburg-v1809b
S104878 nvme-revolution-jburg-v1809b
 
S016389 ibm-cos-brazil-v1708b
S016389 ibm-cos-brazil-v1708bS016389 ibm-cos-brazil-v1708b
S016389 ibm-cos-brazil-v1708b
 
S100297 ilm-archive-orlando-v1804c
S100297 ilm-archive-orlando-v1804cS100297 ilm-archive-orlando-v1804c
S100297 ilm-archive-orlando-v1804c
 

Similar to Data Footprint Reduction: Understanding IBM Storage Options

Memory Sizing for WebSphere Applications on System z Linux
Memory Sizing for WebSphere Applications on System z LinuxMemory Sizing for WebSphere Applications on System z Linux
Memory Sizing for WebSphere Applications on System z LinuxIBM India Smarter Computing
 
IBM SONAS and the Cloud Storage Taxonomy
IBM SONAS and the Cloud Storage TaxonomyIBM SONAS and the Cloud Storage Taxonomy
IBM SONAS and the Cloud Storage TaxonomyTony Pearson
 
Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1
Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1
Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1IBM India Smarter Computing
 
Fusion-io SSD and SQL Server 2008
Fusion-io SSD and SQL Server 2008Fusion-io SSD and SQL Server 2008
Fusion-io SSD and SQL Server 2008Mark Ginnebaugh
 
Fusion Iossdandsqlserver2008 091022013943 Phpapp02
Fusion Iossdandsqlserver2008 091022013943 Phpapp02Fusion Iossdandsqlserver2008 091022013943 Phpapp02
Fusion Iossdandsqlserver2008 091022013943 Phpapp02eddiesauvao
 
As fast as a grid, as safe as a database
As fast as a grid, as safe as a databaseAs fast as a grid, as safe as a database
As fast as a grid, as safe as a databasegojkoadzic
 
Workload Groups overview updates
Workload Groups overview updatesWorkload Groups overview updates
Workload Groups overview updatesCOMMON Europe
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Benoit Hudzia
 
Disaster Recovery in the Cloud -- A Failover Testing Case Study
Disaster Recovery in the Cloud -- A Failover Testing Case StudyDisaster Recovery in the Cloud -- A Failover Testing Case Study
Disaster Recovery in the Cloud -- A Failover Testing Case StudyOnline Tech
 
Operating MongoDB in the Cloud
Operating MongoDB in the CloudOperating MongoDB in the Cloud
Operating MongoDB in the CloudMongoDB
 
DB2 and storage management
DB2 and storage managementDB2 and storage management
DB2 and storage managementCraig Mullins
 
S106195 cos-use cases-istanbul-v1902a
S106195 cos-use cases-istanbul-v1902aS106195 cos-use cases-istanbul-v1902a
S106195 cos-use cases-istanbul-v1902aTony Pearson
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterAaron Joue
 
JVM Multitenancy (JavaOne 2012)
JVM Multitenancy (JavaOne 2012)JVM Multitenancy (JavaOne 2012)
JVM Multitenancy (JavaOne 2012)Graeme_IBM
 
High speed networks and Java (Ryan Sciampacone)
High speed networks and Java (Ryan Sciampacone)High speed networks and Java (Ryan Sciampacone)
High speed networks and Java (Ryan Sciampacone)Chris Bailey
 
DB2 for z/OS Real Storage Monitoring, Control and Planning
DB2 for z/OS Real Storage Monitoring, Control and PlanningDB2 for z/OS Real Storage Monitoring, Control and Planning
DB2 for z/OS Real Storage Monitoring, Control and PlanningJohn Campbell
 
Helathcare modernize-tebc105-v1704a
Helathcare modernize-tebc105-v1704aHelathcare modernize-tebc105-v1704a
Helathcare modernize-tebc105-v1704aTony Pearson
 
Flood modelling on the Cloud
Flood modelling on the CloudFlood modelling on the Cloud
Flood modelling on the Cloudasm100
 

Similar to Data Footprint Reduction: Understanding IBM Storage Options (20)

Memory Sizing for WebSphere Applications on System z Linux
Memory Sizing for WebSphere Applications on System z LinuxMemory Sizing for WebSphere Applications on System z Linux
Memory Sizing for WebSphere Applications on System z Linux
 
IBM SONAS and the Cloud Storage Taxonomy
IBM SONAS and the Cloud Storage TaxonomyIBM SONAS and the Cloud Storage Taxonomy
IBM SONAS and the Cloud Storage Taxonomy
 
Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1
Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1
Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1
 
Fusion-io SSD and SQL Server 2008
Fusion-io SSD and SQL Server 2008Fusion-io SSD and SQL Server 2008
Fusion-io SSD and SQL Server 2008
 
Fusion Iossdandsqlserver2008 091022013943 Phpapp02
Fusion Iossdandsqlserver2008 091022013943 Phpapp02Fusion Iossdandsqlserver2008 091022013943 Phpapp02
Fusion Iossdandsqlserver2008 091022013943 Phpapp02
 
As fast as a grid, as safe as a database
As fast as a grid, as safe as a databaseAs fast as a grid, as safe as a database
As fast as a grid, as safe as a database
 
CloudStack Hyderabad Meetup: Using CloudStack to build IaaS clouds
CloudStack Hyderabad Meetup: Using CloudStack to build IaaS cloudsCloudStack Hyderabad Meetup: Using CloudStack to build IaaS clouds
CloudStack Hyderabad Meetup: Using CloudStack to build IaaS clouds
 
Workload Groups overview updates
Workload Groups overview updatesWorkload Groups overview updates
Workload Groups overview updates
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
 
Disaster Recovery in the Cloud -- A Failover Testing Case Study
Disaster Recovery in the Cloud -- A Failover Testing Case StudyDisaster Recovery in the Cloud -- A Failover Testing Case Study
Disaster Recovery in the Cloud -- A Failover Testing Case Study
 
Operating MongoDB in the Cloud
Operating MongoDB in the CloudOperating MongoDB in the Cloud
Operating MongoDB in the Cloud
 
DB2 and storage management
DB2 and storage managementDB2 and storage management
DB2 and storage management
 
S106195 cos-use cases-istanbul-v1902a
S106195 cos-use cases-istanbul-v1902aS106195 cos-use cases-istanbul-v1902a
S106195 cos-use cases-istanbul-v1902a
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver Cluster
 
JVM Multitenancy (JavaOne 2012)
JVM Multitenancy (JavaOne 2012)JVM Multitenancy (JavaOne 2012)
JVM Multitenancy (JavaOne 2012)
 
High speed networks and Java (Ryan Sciampacone)
High speed networks and Java (Ryan Sciampacone)High speed networks and Java (Ryan Sciampacone)
High speed networks and Java (Ryan Sciampacone)
 
DB2 for z/OS Real Storage Monitoring, Control and Planning
DB2 for z/OS Real Storage Monitoring, Control and PlanningDB2 for z/OS Real Storage Monitoring, Control and Planning
DB2 for z/OS Real Storage Monitoring, Control and Planning
 
Helathcare modernize-tebc105-v1704a
Helathcare modernize-tebc105-v1704aHelathcare modernize-tebc105-v1704a
Helathcare modernize-tebc105-v1704a
 
Flood modelling on the Cloud
Flood modelling on the CloudFlood modelling on the Cloud
Flood modelling on the Cloud
 

More from Tony Pearson

Rapid_Recovery-T75-v2204j.pdf
Rapid_Recovery-T75-v2204j.pdfRapid_Recovery-T75-v2204j.pdf
Rapid_Recovery-T75-v2204j.pdfTony Pearson
 
L203326 intro-maria db-techu2020-v9
L203326 intro-maria db-techu2020-v9L203326 intro-maria db-techu2020-v9
L203326 intro-maria db-techu2020-v9Tony Pearson
 
S200743 storage-announcements-ist2020-v2001a
S200743 storage-announcements-ist2020-v2001aS200743 storage-announcements-ist2020-v2001a
S200743 storage-announcements-ist2020-v2001aTony Pearson
 
S200516 copy-data-management-ist2020-v2001c
S200516 copy-data-management-ist2020-v2001cS200516 copy-data-management-ist2020-v2001c
S200516 copy-data-management-ist2020-v2001cTony Pearson
 
S200515 storage-insights-ist2020-v2001d
S200515 storage-insights-ist2020-v2001dS200515 storage-insights-ist2020-v2001d
S200515 storage-insights-ist2020-v2001dTony Pearson
 
F200612 deliver-message-ist2020-v2001c
F200612 deliver-message-ist2020-v2001cF200612 deliver-message-ist2020-v2001c
F200612 deliver-message-ist2020-v2001cTony Pearson
 
Z111806 strengthen-security-sydney-v1910a
Z111806 strengthen-security-sydney-v1910aZ111806 strengthen-security-sydney-v1910a
Z111806 strengthen-security-sydney-v1910aTony Pearson
 
G111614 top-trends-sydney2019-v1910a
G111614 top-trends-sydney2019-v1910aG111614 top-trends-sydney2019-v1910a
G111614 top-trends-sydney2019-v1910aTony Pearson
 
G111416 personal-brand-sydney-v1910b
G111416 personal-brand-sydney-v1910bG111416 personal-brand-sydney-v1910b
G111416 personal-brand-sydney-v1910bTony Pearson
 
Z109889 z4 r-storage-dfsms-vegas-v1910b
Z109889 z4 r-storage-dfsms-vegas-v1910bZ109889 z4 r-storage-dfsms-vegas-v1910b
Z109889 z4 r-storage-dfsms-vegas-v1910bTony Pearson
 
Z110932 strengthen-security-jburg-v1909c
Z110932 strengthen-security-jburg-v1909cZ110932 strengthen-security-jburg-v1909c
Z110932 strengthen-security-jburg-v1909cTony Pearson
 
Z109889 z4 r-storage-dfsms-jburg-v1909d
Z109889 z4 r-storage-dfsms-jburg-v1909dZ109889 z4 r-storage-dfsms-jburg-v1909d
Z109889 z4 r-storage-dfsms-jburg-v1909dTony Pearson
 
S111477 scale-in-cloud-jburg-v1909d
S111477 scale-in-cloud-jburg-v1909dS111477 scale-in-cloud-jburg-v1909d
S111477 scale-in-cloud-jburg-v1909dTony Pearson
 
S110646 storage-for-ai-jburg-v1909c
S110646 storage-for-ai-jburg-v1909cS110646 storage-for-ai-jburg-v1909c
S110646 storage-for-ai-jburg-v1909cTony Pearson
 
G108263 personal-brand-berlin-v1904a
G108263 personal-brand-berlin-v1904aG108263 personal-brand-berlin-v1904a
G108263 personal-brand-berlin-v1904aTony Pearson
 
S108283 svc-storwize-lagos-v1905d
S108283 svc-storwize-lagos-v1905dS108283 svc-storwize-lagos-v1905d
S108283 svc-storwize-lagos-v1905dTony Pearson
 
G108277 ds8000-resiliency-lagos-v1905c
G108277 ds8000-resiliency-lagos-v1905cG108277 ds8000-resiliency-lagos-v1905c
G108277 ds8000-resiliency-lagos-v1905cTony Pearson
 
G108276 public-speaking-lagos-v1905b
G108276 public-speaking-lagos-v1905bG108276 public-speaking-lagos-v1905b
G108276 public-speaking-lagos-v1905bTony Pearson
 
G108266 stack-the-deck-lagos-v1905c
G108266 stack-the-deck-lagos-v1905cG108266 stack-the-deck-lagos-v1905c
G108266 stack-the-deck-lagos-v1905cTony Pearson
 
G107984 personal-brand-atlanta-v1904a
G107984 personal-brand-atlanta-v1904aG107984 personal-brand-atlanta-v1904a
G107984 personal-brand-atlanta-v1904aTony Pearson
 

More from Tony Pearson (20)

Rapid_Recovery-T75-v2204j.pdf
Rapid_Recovery-T75-v2204j.pdfRapid_Recovery-T75-v2204j.pdf
Rapid_Recovery-T75-v2204j.pdf
 
L203326 intro-maria db-techu2020-v9
L203326 intro-maria db-techu2020-v9L203326 intro-maria db-techu2020-v9
L203326 intro-maria db-techu2020-v9
 
S200743 storage-announcements-ist2020-v2001a
S200743 storage-announcements-ist2020-v2001aS200743 storage-announcements-ist2020-v2001a
S200743 storage-announcements-ist2020-v2001a
 
S200516 copy-data-management-ist2020-v2001c
S200516 copy-data-management-ist2020-v2001cS200516 copy-data-management-ist2020-v2001c
S200516 copy-data-management-ist2020-v2001c
 
S200515 storage-insights-ist2020-v2001d
S200515 storage-insights-ist2020-v2001dS200515 storage-insights-ist2020-v2001d
S200515 storage-insights-ist2020-v2001d
 
F200612 deliver-message-ist2020-v2001c
F200612 deliver-message-ist2020-v2001cF200612 deliver-message-ist2020-v2001c
F200612 deliver-message-ist2020-v2001c
 
Z111806 strengthen-security-sydney-v1910a
Z111806 strengthen-security-sydney-v1910aZ111806 strengthen-security-sydney-v1910a
Z111806 strengthen-security-sydney-v1910a
 
G111614 top-trends-sydney2019-v1910a
G111614 top-trends-sydney2019-v1910aG111614 top-trends-sydney2019-v1910a
G111614 top-trends-sydney2019-v1910a
 
G111416 personal-brand-sydney-v1910b
G111416 personal-brand-sydney-v1910bG111416 personal-brand-sydney-v1910b
G111416 personal-brand-sydney-v1910b
 
Z109889 z4 r-storage-dfsms-vegas-v1910b
Z109889 z4 r-storage-dfsms-vegas-v1910bZ109889 z4 r-storage-dfsms-vegas-v1910b
Z109889 z4 r-storage-dfsms-vegas-v1910b
 
Z110932 strengthen-security-jburg-v1909c
Z110932 strengthen-security-jburg-v1909cZ110932 strengthen-security-jburg-v1909c
Z110932 strengthen-security-jburg-v1909c
 
Z109889 z4 r-storage-dfsms-jburg-v1909d
Z109889 z4 r-storage-dfsms-jburg-v1909dZ109889 z4 r-storage-dfsms-jburg-v1909d
Z109889 z4 r-storage-dfsms-jburg-v1909d
 
S111477 scale-in-cloud-jburg-v1909d
S111477 scale-in-cloud-jburg-v1909dS111477 scale-in-cloud-jburg-v1909d
S111477 scale-in-cloud-jburg-v1909d
 
S110646 storage-for-ai-jburg-v1909c
S110646 storage-for-ai-jburg-v1909cS110646 storage-for-ai-jburg-v1909c
S110646 storage-for-ai-jburg-v1909c
 
G108263 personal-brand-berlin-v1904a
G108263 personal-brand-berlin-v1904aG108263 personal-brand-berlin-v1904a
G108263 personal-brand-berlin-v1904a
 
S108283 svc-storwize-lagos-v1905d
S108283 svc-storwize-lagos-v1905dS108283 svc-storwize-lagos-v1905d
S108283 svc-storwize-lagos-v1905d
 
G108277 ds8000-resiliency-lagos-v1905c
G108277 ds8000-resiliency-lagos-v1905cG108277 ds8000-resiliency-lagos-v1905c
G108277 ds8000-resiliency-lagos-v1905c
 
G108276 public-speaking-lagos-v1905b
G108276 public-speaking-lagos-v1905bG108276 public-speaking-lagos-v1905b
G108276 public-speaking-lagos-v1905b
 
G108266 stack-the-deck-lagos-v1905c
G108266 stack-the-deck-lagos-v1905cG108266 stack-the-deck-lagos-v1905c
G108266 stack-the-deck-lagos-v1905c
 
G107984 personal-brand-atlanta-v1904a
G107984 personal-brand-atlanta-v1904aG107984 personal-brand-atlanta-v1904a
G107984 personal-brand-atlanta-v1904a
 

Recently uploaded

UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1DianaGray10
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2DianaGray10
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsDianaGray10
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Libraryshyamraj55
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxKaustubhBhavsar6
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4DianaGray10
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptxHansamali Gamage
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 

Recently uploaded (20)

UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 

Data Footprint Reduction: Understanding IBM Storage Options

  • 1. sSE20 Data Footprint Reduction: Understanding IBM Storage Efficiency Options Tony Pearson Master Inventor and Senior Managing Consultant, IBM Corp Sanjay S Bhikot Advisory Unix and Storage Administrator, Ricoh Americas Corp #IBMEDGE © 2012 IBM Corporation
  • 2. Data Footprint Reduction is the catch-all term for a variety of technologies designed to help reduce storage costs. This session will cover thin provisioning, space- efficient copies, deduplication and compression technologies, and describe the IBM storage products that provide these capabilities. #IBMEDGE © 2012 IBM Corporation
  • 3. Sessions -- Tony Pearson • Monday – 1:00pm Storing Archive Data for Compliance Challenges – 4:15pm IBM Watson: What it Means for Society • Tuesday – 4:15pm Using Social Media: Birds of a Feather (BOF) • Wednesday – 9:00am Data Footprint Reduction: IBM Storage options – 2:30pm IBM's Storage Strategy in the Smarter Computing era – 4:15pm IBM SONAS and the Cloud Storage Taxonomy • Thursday – 9:00am IBM Watson: What it Means for Society – 10:30am Tivoli Storage Productivity Center Overview – 5:30pm IBM Edge “Free for All” hosted by Scott Drummond 3 #IBMEDGE © 2012 IBM Corporation
  • 4. Agenda • Thin Provisioning • Space-Efficient Copy • Data Deduplication • Compression #IBMEDGE © 2012 IBM Corporation
  • 5. History of Thin Provisioning The StorageTek Iceberg 9200 Array Introduced Thin 1997 Today Provisioning on slower 7200RPM drives for mainframe systems Thin Provisioning is available for many operating systems 1994 on IBM storage, including DS8000, IBM resold this as XIV, SVC, N series, the RAMAC Virtual Storwize V7000, Array (RVA) for DS3500 and mainframe servers DCS3700 5 #IBMEDGE © 2012 IBM Corporation
  • 6. Why Space is Over-Allocated • Scenario 1 • Scenario 2 – Space requirements – Space requirements under-estimated over-estimated – Running out of space – Capacity lasts for years requires larger volume • No data migration – New request may take • No application outages weeks to accommodate • No penalties • Application outage if not addressed in time – Data must be moved to When faced with this dilemma, the larger volume most will err on the side of over-estimating • Application outage during data movement 6 #IBMEDGE © 2012 IBM Corporation
  • 7. Fully Allocated vs. Thin Provisioned Allocated but unused space dedicated to this host, wasted until written to Host sees fully allocated amount Actual data written Empty space available to others Physical Space Allocated Host sees full virtual amount Actual data written 7 #IBMEDGE © 2012 IBM Corporation
  • 8. Fully Allocated vs. Thin Provisioned Volume/LUN – one or more extents Host sees a volume or LUN that consists Extent – Allocation Unit of blocks numbered One or more grains 0 to nnnnnnnnnn Grain – range of 1 or more blocks Block – typically 512 or 4096 bytes 8 #IBMEDGE © 2012 IBM Corporation
  • 9. Coarse and Fine-Grain 9 Block 00, 55, and 99 written 8 Fully Allocated, all 10 extents allocated Coarse-Grain, only 3 extents allocated 7 Fine-Grain, only 1 extent allocated 6 5 Grain 00-01 4 Grain 90-99 = extent 3 Grain 54-55 2 9 Grain 98-99 1 5 0 0 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Fully Allocated Coarse-Grain Fine-Grain 9 #IBMEDGE © 2012 IBM Corporation
  • 10. How IBM has implemented TP IBM DS8000 IBM XIV SVC and DS3500, Storwize DCS3700 V7000 Type Coarse Fine Fine Fine Allocation 1 GB 17 GB 16MB to 4 GB Unit 8GB Grain size 1 MB 32-256 KB 64 KB 10 #IBMEDGE © 2012 IBM Corporation
  • 11. Thick-to-Thin Migration Volume Fully-allocated mirror Thin- volume provisioned volume Copy 0 Copy 1 Only non-zero blocks copied 11 #IBMEDGE © 2012 IBM Corporation
  • 12. Empty Space Reclaim Thin Provisioning, allocations in 17GB units, with 1MB chunks (grains). Only non-zero blocks consume physical space. Avoid writing empty blocks, any I/O request that tries to write a block of all zeros to unallocated space is ignored. Background task to find empty chunks, a background task scans all blocks, looking for chunks containing all zeros. Empty space reclaimed empty chunks are returned to unallocated space, so that it can be used for other volumes 12 #IBMEDGE © 2012 IBM Corporation *** IBM Confidential until July 12, 2011 ***
  • 13. Thin Provisioning Pros • Cons Just-in-Time increased Not all file systems utilization percentage cooperate or friendly Eliminates the pressure to Deletion of files does not make accurate space free space for others estimates “sdelete” writes zeros over deleted file space Dynamically expand volume without impacting Some implementations may applications or rebooting impact I/O performance server May not support same set Reduces the data footprint of features, copy services, and lowers costs or replication Shifts focus from volumes “Writing checks you can’t to storage pool capacity cash” 13 #IBMEDGE © 2012 IBM Corporation
  • 14. Agenda • Thin Provisioning • Space-Efficient Copy • Data Deduplication • Compression #IBMEDGE © 2012 IBM Corporation
  • 15. History of Space-Efficient Copies 1997 Today NetApp introduces Space-Efficient Copy Snapshot in its is available on many WAFL file system IBM storage systems, 1993 including DS8000, XIV, SVC, N series, IBM Enterprise Storwize V7000, Storage Server DS3500, DS5000 and (ESS) introduces DCS3700 NOCOPY parameter on FlashCopy 15 #IBMEDGE © 2012 IBM Corporation
  • 16. Space-Efficient Copies 300 GB Source Traditional Copies Destination 1 Destination 2 Destination 3 100 GB allocated 40 GB written Space-Efficient Copies. 10% reserved 30 GB 16 #IBMEDGE © 2012 IBM Corporation
  • 17. Method 1: Copy on Write (COW) Source Destination • Copy-On-Write (COW) – Copy is set of pointers to Block A B C D original data – Write to original volume: • Pause I/O Source Destination • Copy original block of data to destination • Update original block Block A B C2 D C – Slows performance – May limit # of destination copies – Can be combined with background copy for a full copy 17 #IBMEDGE © 2012 IBM Corporation
  • 18. Method 2: Redirect on Write (ROW) Source Destination • Redirect-On-Write (ROW) – Copy is set of pointers to Block A B C D original data – Write to original volume: • Re-directed to new empty Source Destination space • Previous data left alone Block A B C D C2 – Does not impact performance – Supports many destination copies 18 #IBMEDGE © 2012 IBM Corporation
  • 19. Space-Efficient Copies Pros • Cons Supports both Some implementations Fully-allocated and may impact I/O Thin-Provisioned Sources performance Reduces the data footprint Requires that you and lowers costs estimate the maximum Allows you to keep more percentage changed copies online • Typically 10-20 % Allows you to take copies Exceeding the reserved more frequently space invalidates Can be used as destination copy checkpoint copies during batch processing 19 #IBMEDGE © 2012 IBM Corporation
  • 20. Agenda • Thin Provisioning • Space-Efficient Copy • Data Deduplication • Compression #IBMEDGE © 2012 IBM Corporation
  • 21. History of Data Deduplication Advanced Single Today 2008 Instance Store (A-SIS) bring deduplication for the IBM N series and IBM offers a variety of NetApp disk storage choices, including ProtecTIER, N series, and Tivoli Storage 2007 Manager (TSM v6) IBM acquires Diligent and introduces the ProtecTIER TS7600 virtual tape library with data deduplication 21 #IBMEDGE © 2012 IBM Corporation
  • 22. Data Deduplication • Data deduplication reduces capacity requirements by only storing one unique instance of the data on disk and creating pointers for duplicate data elements 22 #IBMEDGE © 2012 IBM Corporation
  • 23. Deduplication reduces disk required for backup copies 23 #IBMEDGE © 2012 IBM Corporation 23
  • 24. Two Primary Data Deduplication Approaches Hash based HyperFactor Deduplication A different approach Sometimes referred to based on an agnostic as a Content view of data Addressable Storage approach 24 #IBMEDGE © 2012 IBM Corporation 24 31-May-12
  • 25. Hash-Based Approach 1. Slice data into chunks (fixed or variable) A B C D E 2. Generate Hash per chunk and save Ah Bh Ch Dh Eh 3. Slice next data into chunks and look for Hash Match A B C D E 4. Reference data previously stored 25 #IBMEDGE © 2012 IBM Corporation 25 31-May-12
  • 26. HyperFactor Approach 1. Look through data for similarity New Data Stream 2. Read elements that are most similar 3. Diff reference with version – will use several elements Element A Element B Element C 4. Matches factored out – unique data added to repository 26 #IBMEDGE © 2012 IBM Corporation 26 31-May-12
  • 27. Assessment of Hash-based Approaches Example: Imagine a chunk size • Applicable for all chunking of 8 KB methods • 1 TB repository has • Hash Table in Memory ~125,000,000 8 KB chunks – Overhead for in-band deduplication • Each hash is 20 bytes long – Hash table will grow with data volume • Need pointers scheme to – Growing hash-table may become reference 1 TB performance bottleneck The hash-table requires 2.5 GB – Scalability issues RAM » no issue • Hash-Collisions must be handled • Hash table must be protected With a 100 TB repository – One copy might not be sufficient » ~250 GB of RAM is required 27 #IBMEDGE © 2012 IBM Corporation
  • 28. When Deduplication Occurs 1. In-line Processing – As data is received by the target device it is • Deduplicated in real time • Only unique data stored on disk – Data written to the disk storage is deduplicated 2. Post-Processing – As data is received by the target device it is • Temporarily stored on disk storage – Data is subsequently read back in to be processed by a deduplication engine 28 #IBMEDGE © 2012 IBM Corporation
  • 29. Comparison of Offerings Hash-based HyperFactor In-line Other vendors IBM ProtecTIER Process –TS7680G –TS7650G –TS7650 –TS7620 Express –TS7610 Express Post- • IBM Tivoli Storage Process Manager (TSM) • N series 29 #IBMEDGE © 2012 IBM Corporation
  • 30. IBM ProtecTIER with HyperFactor • Gateways – Attaches up to 1PB of disk – Two models: • TS7680 for IBM System z • TS7650G for distributed systems • Appliances – Disk included inside – Three models for distributed systems • TS7650 … in three sizes • TS7620 (New!) • TS7610 ... in two sizes 30 #IBMEDGE © 2012 IBM Corporation
  • 31. ProtecTIER vs. Tivoli Storage Manager Both Solutions Offer the Benefits of Target side Deduplication: – Greatly reduced storage capacity requirements – Lower operational costs, energy usage and TCO Complementary Solutions Today! – Faster recoveries with more data on disk Can be used together but don’t deduplicate the same data twice Use ProtecTIER When: – Highest performance and capacity scaling are required! – Up to 1400 MB/sec (2.5GB/s with 2 node) deduplication rates are needed – Deduplicated capacities up to 25 PB are required IBM TS7600 – You wish to avoid operational impact of post processing deduplication – A VTL appliance model is desired – Deduplicating across multiple TSM (or other backup) servers Use TSM 6 Built-in Deduplication When: – You desire deduplication operations be completely integrated within TSM – The benefits of deduplication are desired without separate hardware or software dependencies or licenses (ships with TSM Extended Edition) – You desire end to end data lifecycle management with minimized data TSM store 31 #IBMEDGE © 2012 IBM Corporation
  • 32. Data Deduplication Pros • Cons Designed for backups Dealing with Hash Can offer up to 25x data Collisions footprint reduction • May require byte-for-byte • Allows disk backup comparisons or keeping repositories to approach secondary copy of data cost of tape-based Some systems do not scale solutions Some systems have slow Allows more backup restores copies to remain on disk • Re-hydrating data back to for faster restores normal Available with a variety of Primary data may not interfaces, including VTL, dedupe very well OST and NAS • Your mileage may vary! 32 #IBMEDGE © 2012 IBM Corporation
  • 33. Agenda • Thin Provisioning • Space-Efficient Copy • Data Deduplication • Compression #IBMEDGE © 2012 IBM Corporation
  • 34. History of Compression Today 1986 NASA and IBM developed IBM offers the Houston Aerospace real-time compression Spooling Protocol (HASP) for file and block level with compression for long access to disk storage distance data transmission. 1973 IBM introduced the Improved Data Recording Capability (IDRC) for the 3480 tape drive 34 #IBMEDGE © 2012 IBM Corporation
  • 35. Lossy vs. Lossless Methods Compress Compress Decompress Decompress returns data does not return back to its Exactly data back to its Good original contents the same original contents enough? • Lossy • Lossless – Used with music, photos, video, – Used with databases, medical images, scanned emails, spreadsheets, office documents, documents, source code fax machines 35 #IBMEDGE © 2012 IBM Corporation
  • 36. How Compression Works • Lempel-Ziv lossless compression builds a dictionary of repeated phrases, sequences of two or more characters that can be represented with fewer number of bits • In the above excerpt from “Lord of the Rings”, all of the red text represents repeated sequences eligible for compression! Source: The Lempel Ziv Algorithm, Christian Zeeh, 2003 36 #IBMEDGE © 2012 IBM Corporation
  • 37. Compressed Volumes Allocated but unused space dedicated to this host, wasted until written to Physical Space Allocated Actual data written Actual data written Host sees full virtual amount Physical Space Allocated, up to 80% Actual reduction from actual data data written written 37 #IBMEDGE © 2012 IBM Corporation
  • 38. Real-time Compression! Workstations • Real-time Compression for primary data IP – Less data stored on primary storage (up to 80%) Network – No changes to applications or procedures Application Servers • Before it gets to the storage array – Larger effective storage cache – Disk Array can serve more requests from its read / write cache – Lower storage CPU overhead Cache Cache • Does not cause performance degradation – Much smaller I/O / lower disk workload – Reads/Writes are faster due to storage array’s response from cache instead of disk – Additionally reads may come from advanced read ahead cache (no write cache) Disk Array 38 #IBMEDGE © 2012 IBM Corporation 38
  • 39. FIVO vs. VIFO Compressed Compressed Data Data Data Data 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 • Fixed Input, Variable Output • Variable Input, Fixed Output – WAN transmission – Random Access Compression – Sequential tape Engine™ (RACE) – IBM Tivoli Storage – IBM Real-Time Compression Manager Appliances – zip, tar, etc. – IBM SVC, Storwize V7000 39 #IBMEDGE © 2012 IBM Corporation
  • 40. Compression for Disk data Traditional Approaches Real-time Compression Compression after Modification File Compression after Modification A B C A B C A B C D E F File D MN F File D MN F G H I G H I G H I New Compressed ABC DEF GHI New ABC DMN FGH I File Compressed ABC DEF1 GHI MN Blocks Shift File Compressed File Identical Blocks • Extra work to ‘edit’ a file • Small amount of work / I/O to edit • All blocks shift – Only one common block • Only modified block changes (this example) – Multiple common blocks – Negative impact to deduplication – Enhances deduplication • No notion of data location • Data location via map 40 #IBMEDGE © 2012 IBM Corporation 40
  • 41. Compression Without Compromise Expected Compression Ratios Up to 80% Databases Linux virtual OSes Up to 70% Server Virtualization Windows virtual OSes Up to 55% Office 2003 Up to 75% Collaboration Office 2007 or later Up to 25% Up to 75% CAD/CAM Engineering/Design 41 #IBMEDGE © 2012 IBM Corporation 41
  • 42. Objectives: • Run over a block device • Estimate: – Portion of non-zero blocks in the volume. – Compression rate of non-zero blocks with RTC. Performance: • Runs FAST! < 60 seconds, no matter what the volume size – Typical running time on a machine with multiple disks: < 20 seconds • Give guarantees on the estimation: ~5% max error guarantee – Can improve guarantee with more running time Method: • Random sampling and compression throughout the volume • Collect enough non-zero samples to gain desired confidence – More zero blocks slower (takes more time to find non-zero blocks) • Mathematical analysis gives confidence guarantees • Note: we are estimating compression during migration of a volume into RTC (data at rest) 42 #IBMEDGE © 2012 IBM Corporation
  • 43. IBM Real-Time Compression • For NAS devices • For Block devices – IBM Real-Time – SAN Volume Controller Compliance Appliance – Storwize V7000 STN 6500 SAN Volume Controller STN 6800 Storwize V7000 #IBMEDGE © 2012 IBM Corporation
  • 44. Migrating to Compressed Disk Volume Fully-allocated mirror Compressed or Thin-provisioned volume volume Copy 0 Copy 1 Only non-zero blocks copied 44 #IBMEDGE © 2012 IBM Corporation
  • 45. Data Compression Pros • Cons Can be used for data Some implementations are transmission, tape and post-process disk data • Stores uncompressed Can offer up to 80% data data first, compress later footprint reduction Some implementations Available as front-end impact performance and/or appliance or integrated consume substantial CPU into storage system resources Can be Benefits vary by data type, “Dedupe-Friendly” and whether applications do their own compression or encryption • Your mileage may vary 45 #IBMEDGE © 2012 IBM Corporation
  • 46. Thank You! Session: sSE20 Presenters: Tony Pearson, Sanjay Bhikot #IBMEDGE Intel, the Intel logo, Xeon and Xeon Inside are trademarks or registered trademarks of Intel Corporation in the U.S. and /or other countries.
  • 47. Additional Resources Email: tpearson@us.ibm.com Twitter: http://twitter.com/az99Øtony Blog: http://ibm.co/brAeZØ Books: http://www.lulu.com/spotlight/99Ø_tony IBM Expert Network: http://www.slideshare.net/az99Øtony 62 #IBMEDGE © 2012 IBM Corporation 62
  • 48. Trademarks and disclaimers © IBM Corporation 2012. All rights reserved. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of The Minister for the Cabinet Office, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other contries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries. Other product and service names might be trademarks of IBM or other companies. Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml. Information is provided "AS IS" without warranty of any kind. The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here. Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography. Photographs shown may be engineering prototypes. Changes may be incorporated in production models. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. 63 #IBMEDGE © 2012 IBM Corporation