Submit Search
Upload
Data Footprint Reduction: Understanding IBM Storage Options
•
4 likes
•
3,224 views
Tony Pearson
Follow
sSE20 presented at IBM Edge 2012 conference
Read less
Read more
Technology
Business
Report
Share
Report
Share
1 of 48
Download Now
Download to read offline
Recommended
S sy0883 smarter-storage-strategy-edge2015-v4
S sy0883 smarter-storage-strategy-edge2015-v4
Tony Pearson
IBM Social Media Birds of a Feather v5
IBM Social Media Birds of a Feather v5
Tony Pearson
S ss0886 pendulum-swings-edge2015-v3
S ss0886 pendulum-swings-edge2015-v3
Tony Pearson
S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5
Tony Pearson
S014066 scale-ess-orlando-v1705a
S014066 scale-ess-orlando-v1705a
Tony Pearson
S100298 pendulum-swings-orlando-v1804a
S100298 pendulum-swings-orlando-v1804a
Tony Pearson
S de0882 new-generation-tiering-edge2015-v3
S de0882 new-generation-tiering-edge2015-v3
Tony Pearson
S016579 data-optimization-spectrum-control-brazil-v2
S016579 data-optimization-spectrum-control-brazil-v2
Tony Pearson
More Related Content
What's hot
S016826 cloud-storage-nola-v1710d
S016826 cloud-storage-nola-v1710d
Tony Pearson
S016825 ibm-cos-nola-v1710d
S016825 ibm-cos-nola-v1710d
Tony Pearson
S100296 data-footprint-orlando-v1804a
S100296 data-footprint-orlando-v1804a
Tony Pearson
S104874 toe-pool-jburg-v1809e
S104874 toe-pool-jburg-v1809e
Tony Pearson
IBM Storage Virtualization
IBM Storage Virtualization
IBM Danmark
S100299 ibm-cos-orlando-v1804c
S100299 ibm-cos-orlando-v1804c
Tony Pearson
S104872 spectrum nas-one-day-jburg-v1809e
S104872 spectrum nas-one-day-jburg-v1809e
Tony Pearson
Storwize v7000 & v7000 unified arrow
Storwize v7000 & v7000 unified arrow
Arrow ECS UK
S016827 pendulum-swings-nola-v1710d
S016827 pendulum-swings-nola-v1710d
Tony Pearson
Choosing the Right Storage for your Server Virtualization Environment
Choosing the Right Storage for your Server Virtualization Environment
Tony Pearson
S014065 cloud-storage-orlando-v1705a
S014065 cloud-storage-orlando-v1705a
Tony Pearson
S016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710b
Tony Pearson
Storage Cloud and Spectrum presentation
Storage Cloud and Spectrum presentation
Joe Krotz
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?
Tony Pearson
IBM Storwize V7000
IBM Storwize V7000
Stefano Todini
Storage cloud and spectrum update February 2016
Storage cloud and spectrum update February 2016
Joe Krotz
S104875 nightmares-dreams-spectrum-control-jburg-v1809h
S104875 nightmares-dreams-spectrum-control-jburg-v1809h
Tony Pearson
S104878 nvme-revolution-jburg-v1809b
S104878 nvme-revolution-jburg-v1809b
Tony Pearson
S016389 ibm-cos-brazil-v1708b
S016389 ibm-cos-brazil-v1708b
Tony Pearson
S100297 ilm-archive-orlando-v1804c
S100297 ilm-archive-orlando-v1804c
Tony Pearson
What's hot
(20)
S016826 cloud-storage-nola-v1710d
S016826 cloud-storage-nola-v1710d
S016825 ibm-cos-nola-v1710d
S016825 ibm-cos-nola-v1710d
S100296 data-footprint-orlando-v1804a
S100296 data-footprint-orlando-v1804a
S104874 toe-pool-jburg-v1809e
S104874 toe-pool-jburg-v1809e
IBM Storage Virtualization
IBM Storage Virtualization
S100299 ibm-cos-orlando-v1804c
S100299 ibm-cos-orlando-v1804c
S104872 spectrum nas-one-day-jburg-v1809e
S104872 spectrum nas-one-day-jburg-v1809e
Storwize v7000 & v7000 unified arrow
Storwize v7000 & v7000 unified arrow
S016827 pendulum-swings-nola-v1710d
S016827 pendulum-swings-nola-v1710d
Choosing the Right Storage for your Server Virtualization Environment
Choosing the Right Storage for your Server Virtualization Environment
S014065 cloud-storage-orlando-v1705a
S014065 cloud-storage-orlando-v1705a
S016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710b
Storage Cloud and Spectrum presentation
Storage Cloud and Spectrum presentation
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?
IBM Storwize V7000
IBM Storwize V7000
Storage cloud and spectrum update February 2016
Storage cloud and spectrum update February 2016
S104875 nightmares-dreams-spectrum-control-jburg-v1809h
S104875 nightmares-dreams-spectrum-control-jburg-v1809h
S104878 nvme-revolution-jburg-v1809b
S104878 nvme-revolution-jburg-v1809b
S016389 ibm-cos-brazil-v1708b
S016389 ibm-cos-brazil-v1708b
S100297 ilm-archive-orlando-v1804c
S100297 ilm-archive-orlando-v1804c
Similar to Data Footprint Reduction: Understanding IBM Storage Options
Memory Sizing for WebSphere Applications on System z Linux
Memory Sizing for WebSphere Applications on System z Linux
IBM India Smarter Computing
IBM SONAS and the Cloud Storage Taxonomy
IBM SONAS and the Cloud Storage Taxonomy
Tony Pearson
Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1
Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1
IBM India Smarter Computing
Fusion-io SSD and SQL Server 2008
Fusion-io SSD and SQL Server 2008
Mark Ginnebaugh
Fusion Iossdandsqlserver2008 091022013943 Phpapp02
Fusion Iossdandsqlserver2008 091022013943 Phpapp02
eddiesauvao
As fast as a grid, as safe as a database
As fast as a grid, as safe as a database
gojkoadzic
CloudStack Hyderabad Meetup: Using CloudStack to build IaaS clouds
CloudStack Hyderabad Meetup: Using CloudStack to build IaaS clouds
CloudStack - Open Source Cloud Computing Project
Workload Groups overview updates
Workload Groups overview updates
COMMON Europe
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
Benoit Hudzia
Disaster Recovery in the Cloud -- A Failover Testing Case Study
Disaster Recovery in the Cloud -- A Failover Testing Case Study
Online Tech
Operating MongoDB in the Cloud
Operating MongoDB in the Cloud
MongoDB
DB2 and storage management
DB2 and storage management
Craig Mullins
S106195 cos-use cases-istanbul-v1902a
S106195 cos-use cases-istanbul-v1902a
Tony Pearson
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver Cluster
Aaron Joue
JVM Multitenancy (JavaOne 2012)
JVM Multitenancy (JavaOne 2012)
Graeme_IBM
High speed networks and Java (Ryan Sciampacone)
High speed networks and Java (Ryan Sciampacone)
Chris Bailey
DB2 for z/OS Real Storage Monitoring, Control and Planning
DB2 for z/OS Real Storage Monitoring, Control and Planning
John Campbell
Helathcare modernize-tebc105-v1704a
Helathcare modernize-tebc105-v1704a
Tony Pearson
Flood modelling on the Cloud
Flood modelling on the Cloud
asm100
Similar to Data Footprint Reduction: Understanding IBM Storage Options
(20)
Memory Sizing for WebSphere Applications on System z Linux
Memory Sizing for WebSphere Applications on System z Linux
IBM SONAS and the Cloud Storage Taxonomy
IBM SONAS and the Cloud Storage Taxonomy
Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1
Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1
Fusion-io SSD and SQL Server 2008
Fusion-io SSD and SQL Server 2008
Fusion Iossdandsqlserver2008 091022013943 Phpapp02
Fusion Iossdandsqlserver2008 091022013943 Phpapp02
As fast as a grid, as safe as a database
As fast as a grid, as safe as a database
CloudStack Hyderabad Meetup: Using CloudStack to build IaaS clouds
CloudStack Hyderabad Meetup: Using CloudStack to build IaaS clouds
Workload Groups overview updates
Workload Groups overview updates
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
Disaster Recovery in the Cloud -- A Failover Testing Case Study
Disaster Recovery in the Cloud -- A Failover Testing Case Study
Operating MongoDB in the Cloud
Operating MongoDB in the Cloud
DB2 and storage management
DB2 and storage management
S106195 cos-use cases-istanbul-v1902a
S106195 cos-use cases-istanbul-v1902a
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver Cluster
JVM Multitenancy (JavaOne 2012)
JVM Multitenancy (JavaOne 2012)
High speed networks and Java (Ryan Sciampacone)
High speed networks and Java (Ryan Sciampacone)
DB2 for z/OS Real Storage Monitoring, Control and Planning
DB2 for z/OS Real Storage Monitoring, Control and Planning
Helathcare modernize-tebc105-v1704a
Helathcare modernize-tebc105-v1704a
Flood modelling on the Cloud
Flood modelling on the Cloud
More from Tony Pearson
Rapid_Recovery-T75-v2204j.pdf
Rapid_Recovery-T75-v2204j.pdf
Tony Pearson
L203326 intro-maria db-techu2020-v9
L203326 intro-maria db-techu2020-v9
Tony Pearson
S200743 storage-announcements-ist2020-v2001a
S200743 storage-announcements-ist2020-v2001a
Tony Pearson
S200516 copy-data-management-ist2020-v2001c
S200516 copy-data-management-ist2020-v2001c
Tony Pearson
S200515 storage-insights-ist2020-v2001d
S200515 storage-insights-ist2020-v2001d
Tony Pearson
F200612 deliver-message-ist2020-v2001c
F200612 deliver-message-ist2020-v2001c
Tony Pearson
Z111806 strengthen-security-sydney-v1910a
Z111806 strengthen-security-sydney-v1910a
Tony Pearson
G111614 top-trends-sydney2019-v1910a
G111614 top-trends-sydney2019-v1910a
Tony Pearson
G111416 personal-brand-sydney-v1910b
G111416 personal-brand-sydney-v1910b
Tony Pearson
Z109889 z4 r-storage-dfsms-vegas-v1910b
Z109889 z4 r-storage-dfsms-vegas-v1910b
Tony Pearson
Z110932 strengthen-security-jburg-v1909c
Z110932 strengthen-security-jburg-v1909c
Tony Pearson
Z109889 z4 r-storage-dfsms-jburg-v1909d
Z109889 z4 r-storage-dfsms-jburg-v1909d
Tony Pearson
S111477 scale-in-cloud-jburg-v1909d
S111477 scale-in-cloud-jburg-v1909d
Tony Pearson
S110646 storage-for-ai-jburg-v1909c
S110646 storage-for-ai-jburg-v1909c
Tony Pearson
G108263 personal-brand-berlin-v1904a
G108263 personal-brand-berlin-v1904a
Tony Pearson
S108283 svc-storwize-lagos-v1905d
S108283 svc-storwize-lagos-v1905d
Tony Pearson
G108277 ds8000-resiliency-lagos-v1905c
G108277 ds8000-resiliency-lagos-v1905c
Tony Pearson
G108276 public-speaking-lagos-v1905b
G108276 public-speaking-lagos-v1905b
Tony Pearson
G108266 stack-the-deck-lagos-v1905c
G108266 stack-the-deck-lagos-v1905c
Tony Pearson
G107984 personal-brand-atlanta-v1904a
G107984 personal-brand-atlanta-v1904a
Tony Pearson
More from Tony Pearson
(20)
Rapid_Recovery-T75-v2204j.pdf
Rapid_Recovery-T75-v2204j.pdf
L203326 intro-maria db-techu2020-v9
L203326 intro-maria db-techu2020-v9
S200743 storage-announcements-ist2020-v2001a
S200743 storage-announcements-ist2020-v2001a
S200516 copy-data-management-ist2020-v2001c
S200516 copy-data-management-ist2020-v2001c
S200515 storage-insights-ist2020-v2001d
S200515 storage-insights-ist2020-v2001d
F200612 deliver-message-ist2020-v2001c
F200612 deliver-message-ist2020-v2001c
Z111806 strengthen-security-sydney-v1910a
Z111806 strengthen-security-sydney-v1910a
G111614 top-trends-sydney2019-v1910a
G111614 top-trends-sydney2019-v1910a
G111416 personal-brand-sydney-v1910b
G111416 personal-brand-sydney-v1910b
Z109889 z4 r-storage-dfsms-vegas-v1910b
Z109889 z4 r-storage-dfsms-vegas-v1910b
Z110932 strengthen-security-jburg-v1909c
Z110932 strengthen-security-jburg-v1909c
Z109889 z4 r-storage-dfsms-jburg-v1909d
Z109889 z4 r-storage-dfsms-jburg-v1909d
S111477 scale-in-cloud-jburg-v1909d
S111477 scale-in-cloud-jburg-v1909d
S110646 storage-for-ai-jburg-v1909c
S110646 storage-for-ai-jburg-v1909c
G108263 personal-brand-berlin-v1904a
G108263 personal-brand-berlin-v1904a
S108283 svc-storwize-lagos-v1905d
S108283 svc-storwize-lagos-v1905d
G108277 ds8000-resiliency-lagos-v1905c
G108277 ds8000-resiliency-lagos-v1905c
G108276 public-speaking-lagos-v1905b
G108276 public-speaking-lagos-v1905b
G108266 stack-the-deck-lagos-v1905c
G108266 stack-the-deck-lagos-v1905c
G107984 personal-brand-atlanta-v1904a
G107984 personal-brand-atlanta-v1904a
Recently uploaded
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series - Day 1
DianaGray10
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
MAGNIntelligence
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
DianaGray10
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
Neo4j
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
DianaGray10
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Alkin Tezuysal
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
shyamraj55
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
nooralam814309
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
xtailishbaloch
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
Neo4j
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
Vijayananda Mohire
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
KaustubhBhavsar6
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
DianaGray10
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
Knoldus Inc.
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
ThousandEyes
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx
Hansamali Gamage
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
adam112203
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
Satishbabu Gunukula
Recently uploaded
(20)
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series - Day 1
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
Data Footprint Reduction: Understanding IBM Storage Options
1.
sSE20
Data Footprint Reduction: Understanding IBM Storage Efficiency Options Tony Pearson Master Inventor and Senior Managing Consultant, IBM Corp Sanjay S Bhikot Advisory Unix and Storage Administrator, Ricoh Americas Corp #IBMEDGE © 2012 IBM Corporation
2.
Data Footprint Reduction
is the catch-all term for a variety of technologies designed to help reduce storage costs. This session will cover thin provisioning, space- efficient copies, deduplication and compression technologies, and describe the IBM storage products that provide these capabilities. #IBMEDGE © 2012 IBM Corporation
3.
Sessions -- Tony
Pearson • Monday – 1:00pm Storing Archive Data for Compliance Challenges – 4:15pm IBM Watson: What it Means for Society • Tuesday – 4:15pm Using Social Media: Birds of a Feather (BOF) • Wednesday – 9:00am Data Footprint Reduction: IBM Storage options – 2:30pm IBM's Storage Strategy in the Smarter Computing era – 4:15pm IBM SONAS and the Cloud Storage Taxonomy • Thursday – 9:00am IBM Watson: What it Means for Society – 10:30am Tivoli Storage Productivity Center Overview – 5:30pm IBM Edge “Free for All” hosted by Scott Drummond 3 #IBMEDGE © 2012 IBM Corporation
4.
Agenda
• Thin Provisioning • Space-Efficient Copy • Data Deduplication • Compression #IBMEDGE © 2012 IBM Corporation
5.
History of Thin
Provisioning The StorageTek Iceberg 9200 Array Introduced Thin 1997 Today Provisioning on slower 7200RPM drives for mainframe systems Thin Provisioning is available for many operating systems 1994 on IBM storage, including DS8000, IBM resold this as XIV, SVC, N series, the RAMAC Virtual Storwize V7000, Array (RVA) for DS3500 and mainframe servers DCS3700 5 #IBMEDGE © 2012 IBM Corporation
6.
Why Space is
Over-Allocated • Scenario 1 • Scenario 2 – Space requirements – Space requirements under-estimated over-estimated – Running out of space – Capacity lasts for years requires larger volume • No data migration – New request may take • No application outages weeks to accommodate • No penalties • Application outage if not addressed in time – Data must be moved to When faced with this dilemma, the larger volume most will err on the side of over-estimating • Application outage during data movement 6 #IBMEDGE © 2012 IBM Corporation
7.
Fully Allocated vs.
Thin Provisioned Allocated but unused space dedicated to this host, wasted until written to Host sees fully allocated amount Actual data written Empty space available to others Physical Space Allocated Host sees full virtual amount Actual data written 7 #IBMEDGE © 2012 IBM Corporation
8.
Fully Allocated vs.
Thin Provisioned Volume/LUN – one or more extents Host sees a volume or LUN that consists Extent – Allocation Unit of blocks numbered One or more grains 0 to nnnnnnnnnn Grain – range of 1 or more blocks Block – typically 512 or 4096 bytes 8 #IBMEDGE © 2012 IBM Corporation
9.
Coarse and Fine-Grain 9
Block 00, 55, and 99 written 8 Fully Allocated, all 10 extents allocated Coarse-Grain, only 3 extents allocated 7 Fine-Grain, only 1 extent allocated 6 5 Grain 00-01 4 Grain 90-99 = extent 3 Grain 54-55 2 9 Grain 98-99 1 5 0 0 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Fully Allocated Coarse-Grain Fine-Grain 9 #IBMEDGE © 2012 IBM Corporation
10.
How IBM has
implemented TP IBM DS8000 IBM XIV SVC and DS3500, Storwize DCS3700 V7000 Type Coarse Fine Fine Fine Allocation 1 GB 17 GB 16MB to 4 GB Unit 8GB Grain size 1 MB 32-256 KB 64 KB 10 #IBMEDGE © 2012 IBM Corporation
11.
Thick-to-Thin Migration
Volume Fully-allocated mirror Thin- volume provisioned volume Copy 0 Copy 1 Only non-zero blocks copied 11 #IBMEDGE © 2012 IBM Corporation
12.
Empty Space Reclaim
Thin Provisioning, allocations in 17GB units, with 1MB chunks (grains). Only non-zero blocks consume physical space. Avoid writing empty blocks, any I/O request that tries to write a block of all zeros to unallocated space is ignored. Background task to find empty chunks, a background task scans all blocks, looking for chunks containing all zeros. Empty space reclaimed empty chunks are returned to unallocated space, so that it can be used for other volumes 12 #IBMEDGE © 2012 IBM Corporation *** IBM Confidential until July 12, 2011 ***
13.
Thin Provisioning
Pros • Cons Just-in-Time increased Not all file systems utilization percentage cooperate or friendly Eliminates the pressure to Deletion of files does not make accurate space free space for others estimates “sdelete” writes zeros over deleted file space Dynamically expand volume without impacting Some implementations may applications or rebooting impact I/O performance server May not support same set Reduces the data footprint of features, copy services, and lowers costs or replication Shifts focus from volumes “Writing checks you can’t to storage pool capacity cash” 13 #IBMEDGE © 2012 IBM Corporation
14.
Agenda
• Thin Provisioning • Space-Efficient Copy • Data Deduplication • Compression #IBMEDGE © 2012 IBM Corporation
15.
History of Space-Efficient
Copies 1997 Today NetApp introduces Space-Efficient Copy Snapshot in its is available on many WAFL file system IBM storage systems, 1993 including DS8000, XIV, SVC, N series, IBM Enterprise Storwize V7000, Storage Server DS3500, DS5000 and (ESS) introduces DCS3700 NOCOPY parameter on FlashCopy 15 #IBMEDGE © 2012 IBM Corporation
16.
Space-Efficient Copies
300 GB Source Traditional Copies Destination 1 Destination 2 Destination 3 100 GB allocated 40 GB written Space-Efficient Copies. 10% reserved 30 GB 16 #IBMEDGE © 2012 IBM Corporation
17.
Method 1: Copy
on Write (COW) Source Destination • Copy-On-Write (COW) – Copy is set of pointers to Block A B C D original data – Write to original volume: • Pause I/O Source Destination • Copy original block of data to destination • Update original block Block A B C2 D C – Slows performance – May limit # of destination copies – Can be combined with background copy for a full copy 17 #IBMEDGE © 2012 IBM Corporation
18.
Method 2: Redirect
on Write (ROW) Source Destination • Redirect-On-Write (ROW) – Copy is set of pointers to Block A B C D original data – Write to original volume: • Re-directed to new empty Source Destination space • Previous data left alone Block A B C D C2 – Does not impact performance – Supports many destination copies 18 #IBMEDGE © 2012 IBM Corporation
19.
Space-Efficient Copies
Pros • Cons Supports both Some implementations Fully-allocated and may impact I/O Thin-Provisioned Sources performance Reduces the data footprint Requires that you and lowers costs estimate the maximum Allows you to keep more percentage changed copies online • Typically 10-20 % Allows you to take copies Exceeding the reserved more frequently space invalidates Can be used as destination copy checkpoint copies during batch processing 19 #IBMEDGE © 2012 IBM Corporation
20.
Agenda
• Thin Provisioning • Space-Efficient Copy • Data Deduplication • Compression #IBMEDGE © 2012 IBM Corporation
21.
History of Data
Deduplication Advanced Single Today 2008 Instance Store (A-SIS) bring deduplication for the IBM N series and IBM offers a variety of NetApp disk storage choices, including ProtecTIER, N series, and Tivoli Storage 2007 Manager (TSM v6) IBM acquires Diligent and introduces the ProtecTIER TS7600 virtual tape library with data deduplication 21 #IBMEDGE © 2012 IBM Corporation
22.
Data Deduplication •
Data deduplication reduces capacity requirements by only storing one unique instance of the data on disk and creating pointers for duplicate data elements 22 #IBMEDGE © 2012 IBM Corporation
23.
Deduplication reduces disk required
for backup copies 23 #IBMEDGE © 2012 IBM Corporation 23
24.
Two Primary Data
Deduplication Approaches Hash based HyperFactor Deduplication A different approach Sometimes referred to based on an agnostic as a Content view of data Addressable Storage approach 24 #IBMEDGE © 2012 IBM Corporation 24 31-May-12
25.
Hash-Based Approach
1. Slice data into chunks (fixed or variable) A B C D E 2. Generate Hash per chunk and save Ah Bh Ch Dh Eh 3. Slice next data into chunks and look for Hash Match A B C D E 4. Reference data previously stored 25 #IBMEDGE © 2012 IBM Corporation 25 31-May-12
26.
HyperFactor Approach
1. Look through data for similarity New Data Stream 2. Read elements that are most similar 3. Diff reference with version – will use several elements Element A Element B Element C 4. Matches factored out – unique data added to repository 26 #IBMEDGE © 2012 IBM Corporation 26 31-May-12
27.
Assessment of Hash-based Approaches Example:
Imagine a chunk size • Applicable for all chunking of 8 KB methods • 1 TB repository has • Hash Table in Memory ~125,000,000 8 KB chunks – Overhead for in-band deduplication • Each hash is 20 bytes long – Hash table will grow with data volume • Need pointers scheme to – Growing hash-table may become reference 1 TB performance bottleneck The hash-table requires 2.5 GB – Scalability issues RAM » no issue • Hash-Collisions must be handled • Hash table must be protected With a 100 TB repository – One copy might not be sufficient » ~250 GB of RAM is required 27 #IBMEDGE © 2012 IBM Corporation
28.
When Deduplication Occurs 1.
In-line Processing – As data is received by the target device it is • Deduplicated in real time • Only unique data stored on disk – Data written to the disk storage is deduplicated 2. Post-Processing – As data is received by the target device it is • Temporarily stored on disk storage – Data is subsequently read back in to be processed by a deduplication engine 28 #IBMEDGE © 2012 IBM Corporation
29.
Comparison of Offerings
Hash-based HyperFactor In-line Other vendors IBM ProtecTIER Process –TS7680G –TS7650G –TS7650 –TS7620 Express –TS7610 Express Post- • IBM Tivoli Storage Process Manager (TSM) • N series 29 #IBMEDGE © 2012 IBM Corporation
30.
IBM ProtecTIER with
HyperFactor • Gateways – Attaches up to 1PB of disk – Two models: • TS7680 for IBM System z • TS7650G for distributed systems • Appliances – Disk included inside – Three models for distributed systems • TS7650 … in three sizes • TS7620 (New!) • TS7610 ... in two sizes 30 #IBMEDGE © 2012 IBM Corporation
31.
ProtecTIER vs. Tivoli Storage
Manager Both Solutions Offer the Benefits of Target side Deduplication: – Greatly reduced storage capacity requirements – Lower operational costs, energy usage and TCO Complementary Solutions Today! – Faster recoveries with more data on disk Can be used together but don’t deduplicate the same data twice Use ProtecTIER When: – Highest performance and capacity scaling are required! – Up to 1400 MB/sec (2.5GB/s with 2 node) deduplication rates are needed – Deduplicated capacities up to 25 PB are required IBM TS7600 – You wish to avoid operational impact of post processing deduplication – A VTL appliance model is desired – Deduplicating across multiple TSM (or other backup) servers Use TSM 6 Built-in Deduplication When: – You desire deduplication operations be completely integrated within TSM – The benefits of deduplication are desired without separate hardware or software dependencies or licenses (ships with TSM Extended Edition) – You desire end to end data lifecycle management with minimized data TSM store 31 #IBMEDGE © 2012 IBM Corporation
32.
Data Deduplication
Pros • Cons Designed for backups Dealing with Hash Can offer up to 25x data Collisions footprint reduction • May require byte-for-byte • Allows disk backup comparisons or keeping repositories to approach secondary copy of data cost of tape-based Some systems do not scale solutions Some systems have slow Allows more backup restores copies to remain on disk • Re-hydrating data back to for faster restores normal Available with a variety of Primary data may not interfaces, including VTL, dedupe very well OST and NAS • Your mileage may vary! 32 #IBMEDGE © 2012 IBM Corporation
33.
Agenda
• Thin Provisioning • Space-Efficient Copy • Data Deduplication • Compression #IBMEDGE © 2012 IBM Corporation
34.
History of Compression
Today 1986 NASA and IBM developed IBM offers the Houston Aerospace real-time compression Spooling Protocol (HASP) for file and block level with compression for long access to disk storage distance data transmission. 1973 IBM introduced the Improved Data Recording Capability (IDRC) for the 3480 tape drive 34 #IBMEDGE © 2012 IBM Corporation
35.
Lossy vs. Lossless
Methods Compress Compress Decompress Decompress returns data does not return back to its Exactly data back to its Good original contents the same original contents enough? • Lossy • Lossless – Used with music, photos, video, – Used with databases, medical images, scanned emails, spreadsheets, office documents, documents, source code fax machines 35 #IBMEDGE © 2012 IBM Corporation
36.
How Compression Works
• Lempel-Ziv lossless compression builds a dictionary of repeated phrases, sequences of two or more characters that can be represented with fewer number of bits • In the above excerpt from “Lord of the Rings”, all of the red text represents repeated sequences eligible for compression! Source: The Lempel Ziv Algorithm, Christian Zeeh, 2003 36 #IBMEDGE © 2012 IBM Corporation
37.
Compressed Volumes Allocated but
unused space dedicated to this host, wasted until written to Physical Space Allocated Actual data written Actual data written Host sees full virtual amount Physical Space Allocated, up to 80% Actual reduction from actual data data written written 37 #IBMEDGE © 2012 IBM Corporation
38.
Real-time Compression! Workstations
• Real-time Compression for primary data IP – Less data stored on primary storage (up to 80%) Network – No changes to applications or procedures Application Servers • Before it gets to the storage array – Larger effective storage cache – Disk Array can serve more requests from its read / write cache – Lower storage CPU overhead Cache Cache • Does not cause performance degradation – Much smaller I/O / lower disk workload – Reads/Writes are faster due to storage array’s response from cache instead of disk – Additionally reads may come from advanced read ahead cache (no write cache) Disk Array 38 #IBMEDGE © 2012 IBM Corporation 38
39.
FIVO vs. VIFO
Compressed Compressed Data Data Data Data 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 • Fixed Input, Variable Output • Variable Input, Fixed Output – WAN transmission – Random Access Compression – Sequential tape Engine™ (RACE) – IBM Tivoli Storage – IBM Real-Time Compression Manager Appliances – zip, tar, etc. – IBM SVC, Storwize V7000 39 #IBMEDGE © 2012 IBM Corporation
40.
Compression for Disk
data Traditional Approaches Real-time Compression Compression after Modification File Compression after Modification A B C A B C A B C D E F File D MN F File D MN F G H I G H I G H I New Compressed ABC DEF GHI New ABC DMN FGH I File Compressed ABC DEF1 GHI MN Blocks Shift File Compressed File Identical Blocks • Extra work to ‘edit’ a file • Small amount of work / I/O to edit • All blocks shift – Only one common block • Only modified block changes (this example) – Multiple common blocks – Negative impact to deduplication – Enhances deduplication • No notion of data location • Data location via map 40 #IBMEDGE © 2012 IBM Corporation 40
41.
Compression Without Compromise Expected
Compression Ratios Up to 80% Databases Linux virtual OSes Up to 70% Server Virtualization Windows virtual OSes Up to 55% Office 2003 Up to 75% Collaboration Office 2007 or later Up to 25% Up to 75% CAD/CAM Engineering/Design 41 #IBMEDGE © 2012 IBM Corporation 41
42.
Objectives: • Run over
a block device • Estimate: – Portion of non-zero blocks in the volume. – Compression rate of non-zero blocks with RTC. Performance: • Runs FAST! < 60 seconds, no matter what the volume size – Typical running time on a machine with multiple disks: < 20 seconds • Give guarantees on the estimation: ~5% max error guarantee – Can improve guarantee with more running time Method: • Random sampling and compression throughout the volume • Collect enough non-zero samples to gain desired confidence – More zero blocks slower (takes more time to find non-zero blocks) • Mathematical analysis gives confidence guarantees • Note: we are estimating compression during migration of a volume into RTC (data at rest) 42 #IBMEDGE © 2012 IBM Corporation
43.
IBM Real-Time Compression •
For NAS devices • For Block devices – IBM Real-Time – SAN Volume Controller Compliance Appliance – Storwize V7000 STN 6500 SAN Volume Controller STN 6800 Storwize V7000 #IBMEDGE © 2012 IBM Corporation
44.
Migrating to Compressed
Disk Volume Fully-allocated mirror Compressed or Thin-provisioned volume volume Copy 0 Copy 1 Only non-zero blocks copied 44 #IBMEDGE © 2012 IBM Corporation
45.
Data Compression
Pros • Cons Can be used for data Some implementations are transmission, tape and post-process disk data • Stores uncompressed Can offer up to 80% data data first, compress later footprint reduction Some implementations Available as front-end impact performance and/or appliance or integrated consume substantial CPU into storage system resources Can be Benefits vary by data type, “Dedupe-Friendly” and whether applications do their own compression or encryption • Your mileage may vary 45 #IBMEDGE © 2012 IBM Corporation
46.
Thank You!
Session: sSE20 Presenters: Tony Pearson, Sanjay Bhikot #IBMEDGE Intel, the Intel logo, Xeon and Xeon Inside are trademarks or registered trademarks of Intel Corporation in the U.S. and /or other countries.
47.
Additional Resources
Email: tpearson@us.ibm.com Twitter: http://twitter.com/az99Øtony Blog: http://ibm.co/brAeZØ Books: http://www.lulu.com/spotlight/99Ø_tony IBM Expert Network: http://www.slideshare.net/az99Øtony 62 #IBMEDGE © 2012 IBM Corporation 62
48.
Trademarks and disclaimers ©
IBM Corporation 2012. All rights reserved. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of The Minister for the Cabinet Office, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other contries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries. Other product and service names might be trademarks of IBM or other companies. Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml. Information is provided "AS IS" without warranty of any kind. The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here. Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography. Photographs shown may be engineering prototypes. Changes may be incorporated in production models. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. 63 #IBMEDGE © 2012 IBM Corporation
Download Now