CIO Perspectives: Opportunities in
Managing the Copy Data Explosion

Erik-Jan Dubóvik
Chief Information Officer
Audax Group

This presentation, including any supporting materials, is owned by Gartner, Inc. and/or its affiliates and is for the sole use of the intended Gartner audience or other
authorized recipients. This presentation may contain information that is confidential, proprietary or otherwise legally protected, and it may not be further copied,
distributed or publicly displayed without the express written permission of Gartner, Inc. or its affiliates.
© 2012 Gartner, Inc. and/or its affiliates. All rights reserved.
About Audax Group
• Background
- Founded 1999, ~140 ppl, offices in Boston & New York

- Investor in lower-middle market companies
- Manage over $5B of assets through our private
equity, mezzanine debt, and private senior debt
businesses
Copy Data Management Visualized
Status Quo
Infrastructure-Centric Data Management

1 Redundant – Multiple silos, same 4 primitives
2 Complex – Keep adding to relieve “symptoms”
3 Slow – Moving lots of data across networks
DUPLICATION + INFRASTRUCTURE +
OPERATIONS + COMPLEXITY + COST

Information-Centric Data Management

1 Flexible – Any environment (virtual, hybrid…)
2 Simple – One integrated data protection app
3 Fast – Data mounts directly to production
A whole new market category…

13 March 2013 ID:G00248888

To go from good to great, storage administrators should
evaluate these types of tools:

“Copy data management: These products can
perform a host of functions, including backup,
archiving, replication and creation of test data
using a minimal number of copies.”
…And a ‘Best Practice’
Best Practices for Repairing the Broken State of Backup
“The notion of copy data
management — which reduces the
proliferation of secondary copies of
data for backup, disaster
recovery, testing and reporting — is
becoming increasingly important to
contain costs and to improve
infrastructure agility.” 15 August 2013 G00252768

Dave Russell
VP Distinguished Analyst
Copy Data Growth Drivers
Q: What are the reasons for growth of secondary data copies?
Increased number of applications
More copies per application are created
Larger size of secondary copies to be created
Regulatory requirements to store data for a
specific period of time
New/expanded use of business analytics
Lack of data copy management tools and/or
practices
Other

0% 10% 20% 30% 40% 50% 60% 70% 80%
N=556

% of respondents
The Power of Copy Data Management
Tools Landscape

Replication

Dedup

Backup

Snapshot

Tiering

RecoverPoint
SRDF
MirrorView

DataDomain
Avamar

Avamar
Networker

Remote Copy
Continuous
Access

SnapMirror

StoreOnce

DataProtector

Timefinder
RM
SnapView

Virtual Copy
EVA Snapshot

FAST

Adaptive
Optimization

Inmage
True Copy

HDIM

SyncSort
CommVault
NetBackup

SnapShot
AST

CommVault

Shadow Image
CoW Snapshot

SmartTiers

VVR

PureDisk

NetBackup
BackupExec

RealTime
VxFS DST
Context and Problem
• Situation
- Resource & time intensive business processes require
immediate systems performance and limited downtime
- 5 ESX Hosts, 50 servers, 16TB storage, Dual LTO4
- 500k emails/mo (3,500/FTE); annual data growth 10%

• The Problem
- Backup window entering business day
- Business continuity technology didn’t protect all
systems & relied on tape* for server restoration
- Level 1 RTOs range 5hrs (SQL) to 12hrs (email), 48hrs (file)

- Backup email service not acceptable for multi-hour use
* If tapes are corrupt, RPO grows to 7 days or longer.
Objectives
• Justification & Business Case
- Fully protect all company systems

- Eliminate need for expensive Tier 1 storage
- Establish Co-Lo for systems and personnel
- Free-up expensive real estate (i.e., NY Server Room)
- Avoid growing IT staff

• Specific Goals & Timeline
- 3 month project start-to-finish
- Major improvement of RPO/RTO
Objectives: RPO/RTO

Previous Capability
Level 1 RPO/RTO

Level 2 RPO/RTO

Level 3 RPO/RTO

24hr/80hr

24hr/8.7 days

24hr/19.6 days

Actifio Target
Level 1 RPO/RTO

Level 2 RPO/RTO

Level 3 RPO/RTO

3hr/15min

18hr/30min

24hr/45min

Graphic source: Wikipedia
Approach: Options
• Alternatives Considered
- Expand existing host-based replication software
(DoubleTake, WANSync)
- Veeam + new storage
• Pushing limits of tech at a comparatively higher cost

• Considerations
Failover: How long to “spin up”
server in Production site? DR?

Application support: Linux,
Exchange, SQL, Server,
SharePoint?

Storage: How much required?
De-dupe/compression (important
if using one device for backup)?

Replication: Site-to-site onpremise capable? Site-to-Cloud
(If so, what limitations, if any)?

Severability vs. Integration:
Acceptable risk if part of VM
environment (vs. standalone)?

Data Restore: Server vs. itemlevel? Number of snapshots?
How long to “spin up” server?

Cost: Savings from HW/ SW
elimination, avoidance &
downsizing? Staff optimization?

Timing: Natural refresh cycle of
related HW/ SW (e.g., storage,
dedupe, backup, data center)?

Connectivity: Local
environment (Fibre vs. iSCSI)?
WAN (1MB/5/10/100/1GB)?
Approach
• Strategies
- Engage business management to participate in people/
process change and define system priorities
- Embrace opportunity around architecture change

• Technologies Leveraged
- Actifio, VMware, Cisco, Metro-E (100MB)
Our Actifio Environment
SITE A: PRODUCTION

SITE B: FAILOVER

 Ingest Server ONCE
 only changed blocks
Capture
(zero backup window)



Instantly mount recovered data
(zero restore window)



Recreate data
on demand

 only unique blocks
Store
(10X lower storage)



Incremental restore for BC

Move only unique blocks
(70% less bandwidth)



Instantly mount
recovered data
(zero restore window)



Recreate
data on
demand
Challenges & Results
• Biggest Challenges
- Overly aggressive protection SLAs @ start

- Multiple power outages during transition
- Metro-E providers didn’t provide “true” Layer 2

• How Did We Overcome Them (Or Not)?
- Increased RPOs for Level 2 & 3 systems
- Stopped synchronization for 18 hours to re-index
system
- Implement Network Interface Devices (NIDs) to route
all Layer 2 traffic (necessary for Metro-E High
Availability)
Challenges & Results
• Results: $ and Intangible
- Increased short-term costs, but $150k less than
alternative.
- Met all RPO/RTO objectives; didn’t meet timeline
• Metro-E networking issues were unforeseen

• Upside Surprises
- Added near real-time restoration of item-level objects
from any backup of Exchange & SharePoint
- Decided to move Production to Co-LO; new storage
implementation to be handled through Actifio
Lessons Learned & Recommendations
• Lessons Learned
- Engage telecom carrier Engineering early on

- Use project as opportunity to review Business
Continuity on a holistic basis
- Partner w/ cross-functional vendor (storage, backup)

• What Would We Do Differently?
- Less aggressive with Level 2 & 3 SLAs @ start
- Test network technology earlier & more often
Quantifying The Problem
The Copy Data Ratio (CDR)

Total Data in
Environment
(TB)

Total Amount
of Production
Data (TB)

100

Example: (45TB / 8TB ) x 100 = 563
Quantifying The Problem
The Copy Data Ratio (CDR)
What’s Your Number?

100 – 150

150 – 350

350 – 700

700 –
1,000

Optimistic

Opportunistic

Urgency

Crisis

563
Evaluating CDR Score in Relation to
Operational Complexity
High

3

1

Opportunity for
savings, some
efficiency gains

Transformational
opportunity for
savings,
efficiency gains

Tools
in Use

563

4

2

Limited savings,
efficiency
opportunities

Large opportunity
for savings,
efficiency gains

Low
Low

Copy Data Ratio

High
Summary
• Copy data is a source of significant spend and
inefficiency in the enterprise
• Impact felt most severely on revenue-generating and
business-agility initiatives
• Delays / issues due to resource drain from copy data
sprawl
• Important to understand the magnitude of the
problem
• Calculating the Copy Data Ratio (CDR) can help
influence an action plan based on effort / impact
analysis

Audax Group: CIO Perspectives - Managing The Copy Data Explosion

  • 1.
    CIO Perspectives: Opportunitiesin Managing the Copy Data Explosion Erik-Jan Dubóvik Chief Information Officer Audax Group This presentation, including any supporting materials, is owned by Gartner, Inc. and/or its affiliates and is for the sole use of the intended Gartner audience or other authorized recipients. This presentation may contain information that is confidential, proprietary or otherwise legally protected, and it may not be further copied, distributed or publicly displayed without the express written permission of Gartner, Inc. or its affiliates. © 2012 Gartner, Inc. and/or its affiliates. All rights reserved.
  • 2.
    About Audax Group •Background - Founded 1999, ~140 ppl, offices in Boston & New York - Investor in lower-middle market companies - Manage over $5B of assets through our private equity, mezzanine debt, and private senior debt businesses
  • 3.
    Copy Data ManagementVisualized Status Quo Infrastructure-Centric Data Management 1 Redundant – Multiple silos, same 4 primitives 2 Complex – Keep adding to relieve “symptoms” 3 Slow – Moving lots of data across networks DUPLICATION + INFRASTRUCTURE + OPERATIONS + COMPLEXITY + COST Information-Centric Data Management 1 Flexible – Any environment (virtual, hybrid…) 2 Simple – One integrated data protection app 3 Fast – Data mounts directly to production
  • 4.
    A whole newmarket category… 13 March 2013 ID:G00248888 To go from good to great, storage administrators should evaluate these types of tools: “Copy data management: These products can perform a host of functions, including backup, archiving, replication and creation of test data using a minimal number of copies.”
  • 5.
    …And a ‘BestPractice’ Best Practices for Repairing the Broken State of Backup “The notion of copy data management — which reduces the proliferation of secondary copies of data for backup, disaster recovery, testing and reporting — is becoming increasingly important to contain costs and to improve infrastructure agility.” 15 August 2013 G00252768 Dave Russell VP Distinguished Analyst
  • 6.
    Copy Data GrowthDrivers Q: What are the reasons for growth of secondary data copies? Increased number of applications More copies per application are created Larger size of secondary copies to be created Regulatory requirements to store data for a specific period of time New/expanded use of business analytics Lack of data copy management tools and/or practices Other 0% 10% 20% 30% 40% 50% 60% 70% 80% N=556 % of respondents
  • 7.
    The Power ofCopy Data Management
  • 8.
    Tools Landscape Replication Dedup Backup Snapshot Tiering RecoverPoint SRDF MirrorView DataDomain Avamar Avamar Networker Remote Copy Continuous Access SnapMirror StoreOnce DataProtector Timefinder RM SnapView VirtualCopy EVA Snapshot FAST Adaptive Optimization Inmage True Copy HDIM SyncSort CommVault NetBackup SnapShot AST CommVault Shadow Image CoW Snapshot SmartTiers VVR PureDisk NetBackup BackupExec RealTime VxFS DST
  • 9.
    Context and Problem •Situation - Resource & time intensive business processes require immediate systems performance and limited downtime - 5 ESX Hosts, 50 servers, 16TB storage, Dual LTO4 - 500k emails/mo (3,500/FTE); annual data growth 10% • The Problem - Backup window entering business day - Business continuity technology didn’t protect all systems & relied on tape* for server restoration - Level 1 RTOs range 5hrs (SQL) to 12hrs (email), 48hrs (file) - Backup email service not acceptable for multi-hour use * If tapes are corrupt, RPO grows to 7 days or longer.
  • 10.
    Objectives • Justification &Business Case - Fully protect all company systems - Eliminate need for expensive Tier 1 storage - Establish Co-Lo for systems and personnel - Free-up expensive real estate (i.e., NY Server Room) - Avoid growing IT staff • Specific Goals & Timeline - 3 month project start-to-finish - Major improvement of RPO/RTO
  • 11.
    Objectives: RPO/RTO Previous Capability Level1 RPO/RTO Level 2 RPO/RTO Level 3 RPO/RTO 24hr/80hr 24hr/8.7 days 24hr/19.6 days Actifio Target Level 1 RPO/RTO Level 2 RPO/RTO Level 3 RPO/RTO 3hr/15min 18hr/30min 24hr/45min Graphic source: Wikipedia
  • 12.
    Approach: Options • AlternativesConsidered - Expand existing host-based replication software (DoubleTake, WANSync) - Veeam + new storage • Pushing limits of tech at a comparatively higher cost • Considerations Failover: How long to “spin up” server in Production site? DR? Application support: Linux, Exchange, SQL, Server, SharePoint? Storage: How much required? De-dupe/compression (important if using one device for backup)? Replication: Site-to-site onpremise capable? Site-to-Cloud (If so, what limitations, if any)? Severability vs. Integration: Acceptable risk if part of VM environment (vs. standalone)? Data Restore: Server vs. itemlevel? Number of snapshots? How long to “spin up” server? Cost: Savings from HW/ SW elimination, avoidance & downsizing? Staff optimization? Timing: Natural refresh cycle of related HW/ SW (e.g., storage, dedupe, backup, data center)? Connectivity: Local environment (Fibre vs. iSCSI)? WAN (1MB/5/10/100/1GB)?
  • 13.
    Approach • Strategies - Engagebusiness management to participate in people/ process change and define system priorities - Embrace opportunity around architecture change • Technologies Leveraged - Actifio, VMware, Cisco, Metro-E (100MB)
  • 14.
    Our Actifio Environment SITEA: PRODUCTION SITE B: FAILOVER  Ingest Server ONCE  only changed blocks Capture (zero backup window)  Instantly mount recovered data (zero restore window)  Recreate data on demand  only unique blocks Store (10X lower storage)  Incremental restore for BC Move only unique blocks (70% less bandwidth)  Instantly mount recovered data (zero restore window)  Recreate data on demand
  • 15.
    Challenges & Results •Biggest Challenges - Overly aggressive protection SLAs @ start - Multiple power outages during transition - Metro-E providers didn’t provide “true” Layer 2 • How Did We Overcome Them (Or Not)? - Increased RPOs for Level 2 & 3 systems - Stopped synchronization for 18 hours to re-index system - Implement Network Interface Devices (NIDs) to route all Layer 2 traffic (necessary for Metro-E High Availability)
  • 16.
    Challenges & Results •Results: $ and Intangible - Increased short-term costs, but $150k less than alternative. - Met all RPO/RTO objectives; didn’t meet timeline • Metro-E networking issues were unforeseen • Upside Surprises - Added near real-time restoration of item-level objects from any backup of Exchange & SharePoint - Decided to move Production to Co-LO; new storage implementation to be handled through Actifio
  • 17.
    Lessons Learned &Recommendations • Lessons Learned - Engage telecom carrier Engineering early on - Use project as opportunity to review Business Continuity on a holistic basis - Partner w/ cross-functional vendor (storage, backup) • What Would We Do Differently? - Less aggressive with Level 2 & 3 SLAs @ start - Test network technology earlier & more often
  • 18.
    Quantifying The Problem TheCopy Data Ratio (CDR) Total Data in Environment (TB) Total Amount of Production Data (TB) 100 Example: (45TB / 8TB ) x 100 = 563
  • 19.
    Quantifying The Problem TheCopy Data Ratio (CDR) What’s Your Number? 100 – 150 150 – 350 350 – 700 700 – 1,000 Optimistic Opportunistic Urgency Crisis 563
  • 20.
    Evaluating CDR Scorein Relation to Operational Complexity High 3 1 Opportunity for savings, some efficiency gains Transformational opportunity for savings, efficiency gains Tools in Use 563 4 2 Limited savings, efficiency opportunities Large opportunity for savings, efficiency gains Low Low Copy Data Ratio High
  • 21.
    Summary • Copy datais a source of significant spend and inefficiency in the enterprise • Impact felt most severely on revenue-generating and business-agility initiatives • Delays / issues due to resource drain from copy data sprawl • Important to understand the magnitude of the problem • Calculating the Copy Data Ratio (CDR) can help influence an action plan based on effort / impact analysis