SlideShare a Scribd company logo
1 of 40
Download to read offline
Recover, Restore, Resume:
Protecting the Company Jewels
A Next Generation Data Protection
Industry Primer
KEY CONTACTS
°	 Rick Dalton
	 Managing Director
	 rdalton@svballiant.com
	 650.330.3799
°	 Melody Jones
	 Vice President
	 mjones@svballiant.com
	 650.330.3076
IN THIS ISSUE
°	 A Brief History of Data Protection
°	 Next Generation Data Protection (NGDP) Technologies
°	 NGDP Direction Over the Next Three to Five Years
°	 Select Storage Industry Transactions and Comparables
January 2007
Prognosticating the Next Generation of Data Protection
TABLE OF CONTENTS
1	 Executive Summary
2	 Introduction
3	 A Very Brief History of Data Protection
4	Fertile Ground for Innovation
Figure 1: NGDP Technology Grid  
5	 Next Generation Data Protection (NGDP) Technologies and Markets
Table 1: Select Companies in the NGDP Market
6	 Virtual Tape Libraries (VTLs)
Figure 2: Open Server VTL
Figure 3: Mainframe VTL
Table 2: Select VTL Products on the Market
11	 Continuous Data Protection (CDP)
Figure 4: CDP
Table 3: Select CDP Products on the Market
16	 Continuous Snapshot (a.k.a. Small Aperature Snapshot or SAS) Near CDP or CDP-like
Figure 5: Continuous Snapshot, SAS
Table 4: Select SAS Products on the Market
18	 Distributed Remote Office Branch Office (ROBO) Backup to Disk
Figure 6: Distributed ROBO Backup to Disk
Table 5: Select Distributed ROBO Backup-to-disk Products on the Market
21	 De-duplication—a.k.a. Single Instance Storage (SIS) and Global Single Instance Storage (GSIS)
Figure 7: De-duplication—SIS/GSIS
Table 6: Select De-duplication Products on the Market
23	 Security and Encryption – In-flight and At-rest
Figure 8: Security and Encryption
Figure 9: Record Retention Periods Mandated by Various Regulations in the United States
27	 Market Research
Market Requirements
Figure 10: Data Protection Survey U.S. SME
What the Numbers Mean
Figure 11: Data Protection Survey U.S. SMB
29	 NGDP Direction over the Next Three to Five Years
31	 Conclusions
	
32	 Table 7: Select Storage Industry M&A Transactions
33	 Table 8: Select Storage Industry Company Comparables
34	 Table 9: Interview Tables and Table 10: Acronyms and Abbreviations
35	 Select SVB Alliant Transactions
36	 About SVB Alliant
36	References
Prognosticating the Next Generation of Data Protection

Executive Summary
It is often said that the only constant in technology
is change. Never has this been more true than with
the current state of the data protection market.
In the past, data protection had the justifiably
earned reputation of being a fairly stable and static
market. It was primarily known as simply backup-
to-tape or storage array-based replication. Backup-
to-tape market changes were typically evolutionary
and incremental. Often the changes involved higher-
density and faster tapes, speedier tape drives, better
robots with greater-density tape libraries, or more
operating systems and databases covered by the
backup software.
Changes in storage array replication were also
evolutionary, usually involving different physical
interfaces (such as gigabit Ethernet, faster fibre channel,
fiber connectivity [FICON]), more snapshots, higher
capacities, or incremental snapshots.
The 21st century era of regulatory and legal
compliance, the swelling threats to internal and
external security, and the exponential increase in
electronic data discovery (e-discovery) around
litigation have changed the data protection market
forever. Data protection must do more now than
ever before. Protected data must be preserved (often
for lengthy periods), classified, searched, migrated
by policy or age to lower-cost storage, compressed,
de-duplicated, encrypted, restored on command,
and eventually provably destroyed.
The first of next generation data protection (NGDP)
products have already exploded onto the market.
Theseincludevirtualtapelibraries(VTLs),continuous
data protection (CDP), continuous snapshot (a.k.a.
small aperture snapshot [SAS]), distributed backup to
disk from remote and branch offices, de-duplication,
and security including encryption.
This paper examines the new technologies of the
NGDP market, going past the trade press hype
and digging into their real value propositions. It
then compares most of the players in each of the
technology market niches. Finally, it predicts how
the NGDP market will rapidly evolve over the next
three to five years.
Prognosticating the Next Generation of Data Protection
Introduction
The staid storage software market no longer
deserves the reputation of being a weary bore. It
has been experiencing rapid and radical change.
It is no longer good enough to simply store data
and back it up in case of a disaster. Today’s data is
stored, classified, aged, moved to the appropriate
storage tier based on application and data value
(determined by organizational usage as well as
regulations or laws), moved again (based on aged
value), encrypted, sorted, archived, and even
digitally shredded. Known as information life cycle
management (ILM), it is rapidly being implemented
from medium to large IT organizations as part of
their tiered storage strategy.
ILM is the tip of the iceberg. The unprecedented
expansion of stored data, doubling every 12 to
24 months on average, is causing the data protection
market to undergo far-reaching changes as evidenced
by the explosive proliferation of new technologies,
products, startups, and perceived new markets.
It’s not just the changes and the growth in the
primary data storage market that are driving the
changes in the data protection market. Identity
theft, regulatory compliance, e-discovery, smarter
cyber-crooks, plus an exponential increase in virus
attacks, worms, and bots are also driving it. It seems
as if every week a new report of lost, pilfered,
purchased, or stolen data appears in the news.
Well-known brand names are being embarrassed
by human error and outright theft including such
easily recognizable names as ABN Amro, Bank of
America, Citibank, Fidelity Investments, Hotels.
com, Marriott, the U.S. Veterans Administration,
and the list goes on.
Then there is the increasing mission-critical reliance
on e-mail and instant messaging (IM). E-mail/IM/
BlackBerry servers such as Microsoft Exchange can
and often do crash, leaving organizations scrambling
for hours, days, weeks, and even months to recover
their data.
Legislative bodies, regulatory agencies, and numerous
worldwide consumer groups are taking more than
a little interest. There is now extensive, albeit
sometimes vague, legislation and regulations about
data protection, retention, recoverability, and even
encryption specifying liability costs for failure to
ensure data privacy in the following areas:
°	 Finance
•	 Federal Reserve Board regulations
•	 Securities and Exchange Commission (SEC)
Rule 240 17(a)-4
• 	NASD Rule 2211
°	 Health care
•	 Health Insurance Portability and Accountability
Act (HIPAA)
°	 Pharmaceuticals
•	 21 Code of Federal Regulations (CFR) Part 11
•	 European Union (EU) Annex 11
°	 Commerce risk management
•	 Gramm-Leach-Bliley Act (GLBA)1
°	 Corporate governance
•	 Sarbanes-Oxley Act
°	 Risk management
•	 Basel II Capital Accord (international)
°	 Money laundering directives from the United
States and the European Union
•	 USA PATRIOT Act
•	 EU Data Protection Directive (EU DPD)2
Prognosticating the Next Generation of Data Protection

Not to mention there are more than two dozen
states in the U.S. that have laws on the books based
on California Assembly Bill No. 1950, that spell
out in detail immediate notification requirements
when a consumer’s data is lost or stolen if it is
unencrypted, the highlight of this legislation being
the term unencrypted.
Then there is the cost of compliance failures.
°	 The New York Data Law A.4254 states that failure
to disclose data breaches can result in fines up to
$150,000 per incident.
°	 In February 2006 U.S. investment bank Morgan
Stanley offered to pay $15 million to resolve an
investigation by U.S. regulators into the bank’s
failure to retain e-mail messages. E-mail took
center stage in a $1.58 billion judgment against
the company in a case that centered on the bank’s
inability to produce e-mail documents. The bank
said that backup tapes had been overwritten.
°	 In Zubulake v. UBS Warburg, a gender-
discrimination suit, the judge instructed the
jury that it was legitimate to presume that the
information UBS Warburg couldn’t provide due
to lost backup tapes and e-mails was probably
damaging to the company’s case. Zubulake
was awarded $20 million. Many vendors cite
this precedent as an example and a reason for
customers to purchase their NGDP solutions.
This is only the beginning. More laws and regulations
are on the way. To understand a bit more about where
data protection is going requires some knowledge of
where it has been.
In the beginning data protection was driven by the
need to recover an organization’s data in the event of
a disaster. Disasters were originally narrowly defined
as the natural type (floods, hurricanes, typhoons,
tsunamis,earthquakes,volcanicexplosions,tornados,
fires, and the like) or the man-made type (fires,
explosions, terrorist attacks, and human error). As
horrific as these events can be and often are, they
are relatively rare. Most organizations did not, and
still do not, want to use high-priced primary (a.k.a.
premium) disk storage for rarely used data.
This scenario led to the development of lower-
cost data storage. The first mass-market lower-
cost storage for disaster recovery was tape. Tape
was, and to a large extent today still is, much less
expensive per gigabyte than primary disk storage on
a raw and usable basis. Even data management and
infrastructure were less expensive.
As data processing evolved over time and became
more distributed and user driven, so too did data
protection. Disaster recovery definitions expanded
to include those caused by humans, including data
loss from malware, errors, deletions, disgruntled
employees, and criminals both inside and outside of
the organization.
As end-user work became more tied to desktop and
laptop personal computers (PCs), new requirements
began entering the data protection picture. Data
protection that works for the IT organization as a
whole does not necessarily meet the needs of the
individual user. Disaster recovery was proving to be
far too narrow a definition for data protection. Users
wanted a lot more—such as the ability to recover
individual lost, deleted, or damaged files. Additionally,
they did not want the hassle or time waste of restoring
entire volumes to recover just a single file.
A Brief History of
Data Protection
Prognosticating the Next Generation of Data Protection
Organizational needs, typically data center focused,
were also evolving. Global trade and the Internet
changed the concept of operational hours, making
it easier to accomplish nondisruptive backups. The
amount of data that could be lost (recovery point
objective, or RPO) plus the amount of time required
to be up and operational (recovery time objective,
or RTO) was rapidly decreasing. Technologies such
as storage-based snapshots were developed to help
address these issues. And as organizational RPOs
and RTOs continued to decrease, snapshots evolved,
adding incremental snaps of data plus significantly
more of them, further driving increased storage
requirements.
Backup and recovery are generally measured by two
concepts: recovery time objective (RTO) and recovery
point objective (RPO). The RTO defines the maximum
amount of downtime a company is willing to endure
while dealing with a recovery event. The RPO defines the
maximum amount of data loss the company can endure
in the same situation. For example, consider a company
that has defined a two-hour RPO and a one-hour RTO. In
the face of a data corruption event at noon, the company
expects the systems to be back online and functional no
later than 1:00 p.m., and the data that is used for the
restoration is no older than from 10:00 a.m.
The vast difference in user (distributed) and
organizational (data center) requirements caused a
data protection split. No longer could one product
meet all the demands of the entire organization.
Products that met the data center requirements
seldom met the needs of the individual and
distributed users. Products that met performance
requirements did not always meet storage cost
requirements. Products that met user requirements
rarely met data center requirements.
It is this volatile combination of current laws,
regulations, litigious e-discovery, and historical
evolution that has provided the fertile ground
for clever, innovative NGDP products. The
first wave of NGDP technologies and products
has already appeared on the market with great
fanfare and hype. These technologies include VTL,
CDP, continuous snapshot, distributed backup to
disk from remote office branch office (ROBO),
de-duplication (both locally and globally), officially
designated single instance storage (SIS) and global
single instance storage (GSIS), and data encryption
(in-flight and at-rest).
Fertile Ground
for Innovation
VIRTUAL TAPE
MIRRORING
SNAPSHOT
DATA PROTECTION LEVEL
DATARECOVERABILITY
CDP
DISTRIBUTED ROBO
BACKUP TO DISK
REPLICATION
°
°
°
°
°
°
BACKUP
°
Figure 1: NGDP Technology Grid
Source: SVB Alliant
Prognosticating the Next Generation of Data Protection

° EMC
° FalconStor
° HP
° IBM
° Network Appliance
° Quantum
° Sun/StorageTek
° COPAN Systems
° Data Domain
° Diligent
° Fujitsu/Siemens
° Neartek
° Sepaton
° Spectra Logic
° CA/XOsoft
° CommVault
° EMC/Kashya
° FalconStor
° IBM/Tivoli
° Iron Mountain
° Symantec/Revivio
° Asempra
° Asigra
° Atempo/Storactive
° FilesX
° InMage
° Mendocino
° SonicWALL
° TimeSpring
° Network Appliance
° Cloverleaf
° Exanet
° EMC/Avamar
° FalconStor (VTL)
° Quantum/ADIC/
Rocksoft (imminent)
° Symantec/Data
Center Technologies
(ROBO)
° Asigra (ROBO)
° Data Domain
° Diligent
° ExaGrid
° Sepaton
° EMC/Avamar
° Iron Mountain
° Symantec/VERITAS
° Asigra
° EVault
° Signiant
NGDP Market
VTL CDP Continuous Snapshot
a.k.a. SAS
Distributed ROBO
Backup
De-duplication
NGDP Market
VTL CDP Continuous Snapshot
a.k.a. SAS
Distributed ROBO
Backup
De-duplication
PUBLICCOMPANIESPRIVATECOMPANIES
Next Generation Data Protection
Technologies and Markets
Table 1: Select Companies in the NGDP Market
Source: SVB Alliant
Prognosticating the Next Generation of Data Protection
virtual tape libraries (vtls)
Virtual tape is software that provides the image of
a logical tape drive on magnetic hard disk drives
(HDDs). The software emulates the actual physical
tape; however, all the data is in a tape-streaming
format existing virtually on HDD (see Figure 2).
This stages tape data on disk at a much faster rate
than directly to tape.
VTL value proposition claims:
1.	 VTLs utilize much less real tape as the data
moves from virtual tape to physical tape. Many
backup operations often do not fill the physical
tape media when applications write to the tape.
VTLs can reduce the amount of tape media in a
range of 25 to 98 percent.
2.	 VTLs provide up to five times faster backups
than backup to native tape drives.
3.	 VTLs have been proven to increase backup
reliability by more than 25 percent over native
tape drives or libraries.
4.	 VTLs are much easier to share on a storage area
network (SAN) than native tape drives or libraries.
The largest value proposition comes from the
mainframe market space.
VTLs originated in the mainframe space and have
become a must have in that market. The mainframe
market uses tape differently than the open server
market (Linux, Windows, and Unix). Tape is an
important form of nearline storage as well as for
backup and archiving.
Mainframe tape utilization is typically incredibly
low (on average less than 10 percent) because of the
heavy use as nearline storage. Some tapes have only
a single data set on them. VTLs radically change
VTL TAPE LIBRARY
LINUX
UNIX
WINDOWS
Figure 2: Open Server VTL
Source: SVB Alliant
Prognosticating the Next Generation of Data Protection

zSERIES
MAINFRAME
VTL TAPE LIBRARY
that equation and increase tape utilization to greater
than 90 percent using virtual tape stacking—placing
multiple virtual tapes on a single physical tape
cartridge (see Figure 3). The value proposition for
mainframes based on increased tape utilization (nine
times) alone is incredibly high. Providing faster
backups is another value; however, it is a distant
secondary value proposition for mainframes on par
with improved tape data reliability.
The open server market is a much different story.
This market uses tape only for backup and archival
data storage. Typical open server tape utilization is
usually greater than 60 percent. Using virtual tape
stacking to increase tape utilization to more than 90
percent means a value proposition of only 50 percent
improvement (versus the 900 percent improvement)
for mainframes. The primary open server VTL
value proposition has, until recently, been derived
from increased backup and recovery speeds of disks
versus tapes, improved tape data reliability, and
overall lower total cost of ownership than standard
tape drives and tape libraries. And VTLs are not the
only disk-to-disk (D2D) solution being deployed.
Many IT organizations use CommVault, Legato,
NetBackup, and others that back up natively to
disk and/or compressed disk, eliminating tape
entirely. Alternatively, they can back up to a disk,
keep it online for 90 days, then back up to a high-
density fully populated tape, ensuring rapid data
access or RTO.
More recently, open server VTL products have added
de-duplication. VTL de-duplication meaningfully
reduces the amount of data stored on disk or tape,
ranging from 50 to 95 percent. This reduction can
quantifiably and meaningfully decrease the cost
of storage, potentially making open server VTLs a
no-brainer solution and a viable acquisition strategy
for companies serving mainframe customers.
Figure 3: Mainframe VTL
Source: SVB Alliant
Prognosticating the Next Generation of Data Protection
De-duplication is a hot new area in storage, not limited to
VTL. Some companies are moving quickly into this market
(Diligent and ExaGrid) to compete with the market initiator
Data Domain. Even industry giant Network Appliance is
planning to release a product in the near future. This topic
is discussed further later in this document.
There are two different methodologies for
implementing de-duplication with a VTL in real time:
as the data is being stored and as a background task
after it has been stored. Real-time de-duplication is
done on the fly as the backup data is being received
by the VTL. It de-duplicates based on the files
and/or data blocks (fixed-length data blocks with
hash marks) and/or blocklets (variable-length data
blocks with hash marks) used by the de-duplication
database for comparison, then de-duplicates before
the data is written to disk. Data Domain, Diligent,
ExaGrid, and FalconStor are in this market, and
Quantum/ADIC/Rocksoft has announced it is
entering imminently. This methodology has the
advantage of utilizing disk storage space only for
new data. Conversely, it tends to limit its scalability.
As the data scales and the de-duplication meta
database grows, the speed of both the backups
and restores continuously and noticeably slows
down. The larger the de-duplication meta database
becomes, the longer it takes to de-duplicate before
writing to disk and the longer it takes to recover the
data because more and more changes must be added
back to the original data as it ages.
Background de-duplication waits until the backups
have been written to disk, then discards the de-
duplicated data. The advantage of this methodology
is that it scales ad infinitum. The de-duplication
database never affects the speed of the backups
negatively, and it provides a high degree of confidence
that no unique data is deleted accidentally. Because
it is de-duplicating after the data has been written to
the VTL, it is always removing the older duplicated
data versus the newest data. De-duplication can
actually utilize the smallest atomic unit (byte) or any
of the other larger data types.
Additionally, most users and administrators typically
recover the latest version of their data or at least
the latest clean version of their data. Background
de-duplication means that the data they recover
will require the least amount of manipulation,
leading to incredibly fast recoveries. The downside
to this methodology is that it requires a very
sophisticated content awareness of the data to
be effective. In other words, the de-duplication
engine must be able to see the files, extensions,
and headers, after the data is written to the VTL.
Background de-duplication also requires more hard
disk storage to be utilized as the cache for the
de-duplication. The size of the hard disk cache
continues to grow as the amount of data being
backed up increases. Currently, only Sepaton has
developed VTL-based background de-duplication
and has a number of patents pending.
It is important to realize that de-duplication is not
limited to VTLs. De-duplication provides equivalent
or greater value for all D2D NGDP. There is also a
catch with VTL systems.
The baseline underlying assumption to the value
of open server VTL systems is that data protection
software backs up only to tape or what appears to
be tape, hence the term virtual tape. Through 2004
and much of 2005, major backup software products,
such as those from Computer Associates (CA),
CommVault, EMC, and Symantec, were limited to
backing up to tape and could not natively back up
to disk. If this were to continue to be true, VTLs
would have a bright future with open servers.
Prognosticating the Next Generation of Data Protection

Unfortunately for the VTL suppliers, backup
software has changed. The vast majority of backup
software today, as mentioned, can natively back
up directly to disk or disk subsystems. And just
as VTL vendors are adding de-duplication to their
products, so are the backup software vendors. Many
can migrate data from disk to tape or optical disc
without the additional software overhead of a VTL.
As a general rule, backup users contract for
subscription services. Subscriptions automatically
provide the software upgrades that enhance the
backup software, such as backup to disk and/or
de-duplication. The net effect is that as backup
software has evolved and continues to evolve, it
eliminates much of the open server requirement
for VTLs, which in turn appreciably reduces the
potential VTL addressable market.
VTLs for open servers is a textbook example of
a “Band-Aid” or short-lived product that solves
a functional shortcoming of another technology
product—backup software in this case. Once
the shortcoming is fixed, we believe the market
disappears or greatly diminishes.
VTLs for mainframes is an altogether different
story. In this market the value is still quite high.
There are no inexpensive serial advanced technology
attachment (SATA) or SATA II disks in mainframe
direct access storage devices. Backup media will
continue to be tape for the foreseeable future. And
tape is still used as nearline storage. Mainframe
VTLs will continue to thrive. IT organizations
that have both mainframes and open servers are
able to leverage the mainframe VTL investment for
both environments. Those VTL vendors that have
products for both environments (Diligent, EMC,
Fujitsu/Siemens [CentricStor], IBM, and Sepaton
expected) should continue to have success.
Another downside to VTLs is that each product
has a proprietary user lock-in. There is no standard
data format or algorithm for VTL systems. Products
from different vendors are incompatible with each
other and often with other models from the same
vendor. Many VTL products use the FalconStor
VTL software (COPAN Systems, EMC, IBM, and
Sun/StorageTek).
Copying and/or moving the data from a VTL to
other media for long-term archiving or offsite data
storage is another major issue. Data can be moved or
replicated from a VTL in three different ways:
1.	 VTL manages movement of data between disk
and tape.
2.	 Backup software moves data from VTL to tape.
3.	 VTL systems replicate data to an offsite VTL.
Data moved from the VTL to tape is typically not
reconstructed in the native backup software format.
This means that the backup software cannot recover
the data natively without first going through the
VTL. When migrating the data from the VTL to
tape, inconsistency can be introduced into the
backup software catalog if it is not integrated with
the VTL. This is especially true when the catalog is
not informed that the data has been migrated from
the VTL to tape.
There are numerous vendors with VTL products
in the market today (COPAN Systems, Data
Domain,Diligent,EMC/Neartek,FalconStor,Fujitsu/
Siemens, HP, IBM, Network Appliance, Quantum,
Sepaton, Spectra Logic, and Sun/StorageTek).
Table 2 provides a brief comparison of VTL
products that are currently available.
10
Prognosticating the Next Generation of Data Protection
Table2:SelectVTLProductsontheMarket
Source:SVBAlliant
VTL
Vendor
COPAN
Systems
DataDomain
Diligent
EMC
FalconStor
Fujitsu/
Siemens
HP
IBM
Neartek
Network
Appliance
Quantum
Sepaton
SpectraLogic
Sun/
StorageTek
Models
220T	
220TX
DD460w/VTL	
ProtecTIERVT	
	
VTFOpen	
VTFMainframe
CDL210	
CDL710	
CDL720	
CDL740
VTLStdEdition	
	
VTLEntEdition	
CentricStorVTA	
VLS6000	
TS7510	
	
VTSB10	
	
VTSB20	
VSEAppliance	
	
	
NearStoreVTL600	
NearStoreVTL1200
PathlightVX450	
PathlightVX650	
DX30	
DX100
S2100-ES2
Enterprise	
S2100-ES2Rack
T950	
VSM	
	
VTL1000	
VTL2000	
VTL3000
Basic
Usable
Capacity
TBs
28-448	
28-448
6	
1,000	
	
1,000	
1,000
24	
82.5	
82.5	
165
12	
	
2,000	
1.7-176	
3-71.4	
48	
	
.43-.86	
	
1.73-3.46	
1,000	
	
	
upto84	
upto168
4.2	
3.8-71.5	
5.2-20.8	
5.2-83.2
4.8-1,000	
	
4.8-33.6
upto738	
.2-7	
	
3.8-139.5
De-duplication
Inbound(I)
Background(B)
(I)or(B)
(I)2:1HW
Compression
(I)	
(I)	
	
(I)2:1	
Compression
(I)3:1	
Compression	
	
(I)2:1	
Compression	
	
(I)3:1	
Compression
Q406	
(I)2:1	
Compression	
	
(I)3:1	
Compression	
(I)2:1	
Compression	
	
(I)2:1	
Compression
Roadmap	
	
(I)2:1Hardware
Compression
DeltaStor(B)	
	
No	
No	
	
Q406(1)
Compressed
Usable
Capacity
TBs
56-896	
56-896
80-240	
25,000	
	
2,000	
2,000
72	
247.5	
247.5	
495
24	
	
4,000	
5.1-528	
75-1,785	
96	
	
1.3-2.6	
	
5.2-10.4	
2,000	
	
	
168	
376
4.2	
3.8-71.5	
10.4-41.6	
104-166.4
120-25,000	
	
120-840
upto894	
.8-28	
	
3.8-139.5
Virtual
Libraries
Range
1-56	
5-56
1-47	
16	
	
16	
32
16	
64	
128	
128
16	
	
64	
64	
1-16	
128	
	
1	
	
1	
Upto255
virtual
libraries/FC
port
256	
512
1-6	
1-32	
1-30	
1-55
1,535	
	
192
120	
1	
	
16	
64	
128
Virtual
Drives
Range
1-56	
5-56
1-47	
128	
	
128	
256
64	
512	
1,024	
1,024
128-2,048	
	
512-4,096	
32-512	
1-64	
1,024	
	
64	
	
256	
Upto20
virtualtape
drives/virtual
library
1,500	
3,000
1-20	
1-40	
1-30	
1-55
1,535	
	
192
745	
256	
	
128	
512	
1,024
Max
Throughput
TB/Hr
2.75	
5.2
.29	
2.9(cluster	
of4servers)	
1.7	
1.7
1.37	
2.16	
2.88	
4.32
2.75	
	
2.75	
.61-9	
2.2	
2.75	
	
1.2(est)	
	
1.2(est)	
2	
	
	
2.05	
4.1
2	
2	
1	
3.6
8.6	
	
1.08
31.68	
TBD	
	
1.08	
2.16	
3.24
Interfaces
FC/iSCSI
(4)2GFC	
(8)2GFC
(1)2GFC	
(2-8)2/4GFC	
	
(2-8)2/4GFC	
(2-8)2/4GFICON
(3)2GFC	
(8)2GFC	
(12)2GFC	
(12)2GFC
(2)4GFC(2)	
1GiSCSI	
(4)4GFC(4)	
1GiSCSI
(2-56)2GFCor
ESCON
(2-4)2GFC	
(8)4GFC	
	
(8)4GFICON	
or(8)ESCON	
(8)4GFICON	
or(8)ESCON
Upto64FCor
ESCON	
	
(10)4GFC	
(20)4GFC
(2)2GFC	
(2)2GFC	
(2)2GFC	
(8)2GFC
(16)2GFC	
	
(2)2GFC
(24)4GFC(12)	
1GiSCSI
(4-16)4GFICON	
or(16-32)ESCON	
(6)4GFC	
(10)4GFC	
(20)4GFC
Export
toTape
Libraries
Native
Tape
Yes	
No
No	
No	
	
No	
No
Yes	
Yes	
Yes	
Yes
Yes	
	
Yes	
Yes	
No	
Yes	
	
Yes	
	
Yes	
Yes	
	
	
Yes	
Yes
Yes	
Yes	
Yes	
Yes
No	
	
No
Yes	
Yes	
	
Yes	
Yes	
Yes
Encryption
At
Rest
Yes	
Yes
No	
No	
	
No	
No
Yes	
Yes	
Yes	
Yes
Yes	
	
Yes	
No	
No	
Yes	
	
No	
	
No	
No	
	
	
Yes(Decru	
appliance)
No	
No	
No	
No
No	
	
No
Yes	
No	
	
Yes	
Yes	
Yes
Mainframe
zOS
No	
No
No	
No	
	
No	
Yes
Yes	
Yes	
Yes	
Yes
No	
	
No	
Yes	
No	
No	
	
Yes	
	
Yes	
Yes	
	
	
No	
No
No	
No	
No	
No
Q406	
	
Q406
Yes	
Yes	
	
No	
No	
No
Replication
Offsite
D2D
No	
No
Yes	
Yes,	
usesarray
replication	
Yes	
Yes	
Yes	
Yes
Yes	
	
Yes	
No	
Yes	
Yes	
	
No	
	
No	
Yes	
	
	
Yes	
Yes
No	
No	
No	
No
Yes	
	
Yes
No	
No	
	
Yes	
Yes	
Yes
VTLSoftware
FalconStor
DataDomain
Diligent
FalconStor
FalconStor
Fujitsu/Siemens
Sepaton
FalconStor
IBM
IBM
Neartek
NetworkAppliance
NetworkAppliance
Pathlight
Quantum
Sepaton
SpectraLogic
Sun/
StorageTek
FalconStor
De-duplication
information
Comments
AvailableQ406	
(I)
VTLisadd-onto	
de-dupe
HyperFactor	
	
Standard
Compression
AvailableQ406	
(I)	
	
AvailableQ406	
(I)	
	
Roadmap	
Sameas	
Sepaton(B)
AvailableQ406	
(I)	
	
None	
	
Roadmap	
	
	
Roadmap	
De-duplication	
fromRocksoft
(blockletsI)	
FileByte	
based	
Future	
None	
	
AvailableQ406	
(I)
Prognosticating the Next Generation of Data Protection
11
continuous data protection (cdp)
Based on the trade press, CDP is even hotter
than VTLs in the data protection market. CDP is
definedby the Storage Network Industry Association
(SNIA) to be:
“A methodology that continuously captures or tracks
data modifications and stores changes independent
of the primary data, enabling recovery points
from any point in the past. CDP systems may be
block-, file-, or application-based and can provide
fine granularities of restorable objects to infinitely
variable recovery points.”
IDC’s definition of CDP is similar:
“Continuous data protection, also referred to as
continuous backup, pertains to products that track
and save data to disk so that information can be
recovered from any point in time, even minutes
ago. CDP uses technology to continuously capture
updates to data in real time or near real time,
offering data recovery in a matter of seconds. The
objectives of CDP are to minimize exposure to data
loss and shorten time to recover.”
In general, CDP is a journaling function that keeps
track of all application data changes and time-stamps
them. It theoretically allows an application or a user
to recover data from any point in time.
CDP has an RPO of zero—that is, the user can
restore data to a single moment before the data
corruption or failure, resulting in no data loss.
Traditional data protection technologies usually
range from six to 24 hours, which can be far too long
and too much potential data loss for mission-critical
applications. Although the definition of CDP does
not directly speak to the RTO, most CDP solutions
are capable of very fast data restoration, ranging
from a few seconds to several hours depending on
the vendor and the product.
LINUX
UNIX
WINDOWS
CDP COLLECTOR
CDP SOFTWARE
OR AGENTS
SAN
STORAGE
NAS
STORAGE
Figure 4: CDP
Source: SVB Alliant
12
Prognosticating the Next Generation of Data Protection
This can be heady stuff for the IT manager charged
with protecting the data and can be a significant
expense. For instance, the infrastructure costs of
snapshots or high-function VTLs are similar to
the costs of CDP infrastructure. If CDP costs are
compared with those of traditional data protection,
there can be a large divergence. The features and the
promises of CDP are not free.
CDP’s value comes from three different aspects. The
first is the point solution of creating an infinite RPO
while vastly accelerating the RTO. The second is as
a platform or core technology that exists because
it is tracking data at a very fine-grain level, with
every single write operation with guarantees across
the entire application data set inherently protecting
application coherency and consistency. The third
is the protected data replicability from multiple
locations to a central site and the ability to restore
on different physical machines.
1.	 CDP Point Solution
The RPO and RTO improvements are quite
useful for the IT organization’s most important
applications. These applications are deemed
mission critical because they play such an integral
role in operations. For instance, a brokerage house
uses a database to transact and recover security
trades that drive its business. If this database
is rendered incapacitated, the brokerage’s main
business is no longer functional, having negative
impacts such as loss of revenue, loss of customers,
and possibly regulatory exposure. Often the RTO
and RPO benefits are interesting to applications
that aren’t mission critical, but are nonetheless
very important, and are either very large or
being changed so fast that traditional backup
and restoration technologies are untenable.
In both cases the ability to quickly restore
any previous version of the application’s data
is important.
CDP does not have any gaps in the protected data
because it captures every file, block, or table change
and nuance in the system being protected as it occurs.
To best illustrate the gains CDP achieves as a point
solution, it is worth using a real-life example of a
mission-critical application recovery without and
then with a CDP solution in place.
Microsoft Exchange is the principal e-mail server
for a majority of organizations. It is the primary
communications medium for many organizations’
employees, suppliers, vendors, and, most important,
customers. This makes it mission critical.
Microsoft Exchange is also extraordinarily difficult
to restore with most data protection applications.
Legacy data protection applications require hours,
days, weeks, and occasionally months (when things
go really, really awry!) for a full Microsoft Exchange
restoration, which is far too long for a mission-
critical application. Because restoring Microsoft
Exchange is a lengthy and complex endeavor, as
well as frustratingly time consuming depending
on the number of mailboxes and messages that
need restoring, it is an ideal candidate for CDP
technology. Examining the Microsoft Exchange
restore process makes this abundantly clear:
°	 First, apply the last full backup.
°	 Next, apply the transaction logs—if they are
available (and often they are not).
Prognosticating the Next Generation of Data Protection
13
°	 If the expertise is available, the messages and
the transactions are restored to each individual
mailbox, which is very time consuming. Most
administrators will not even make the attempt,
leaving a lot of data that is never restored.
°	 During the restoration process, Microsoft Exchange
is down and a temporary server is required.
CDP makes Microsoft Exchange recoveries painless
and fast. First it rewinds Microsoft Exchange back
to the last known consistency point and gets it
up and running in seconds or minutes. When
Microsoft Exchange is running again, it provides
a simple point-and-click restoration of messages
and transactions back to the individual mailboxes,
fast and easily.
A similar recovery experience exists for many
enterprise applications such as IBM DB2, IBM
Informix, Microsoft SQL Server, Oracle (other
than the new 10g Enterprise versions), and Sybase.
Older enterprise applications typically have a larger
difference in recovery times and processes.
2.	 CDP Platform
As discussed, CDP captures all of the data as it
is created. The result is the ability to re-create
application data from any previous point in time
with inherent application consistency properties.
Comparing CDP with accomplishing similar
application consistency with traditional approaches,
(backup, VTLs, snapshots, and SASs) demonstrates
why CDP was developed. Traditional approaches
capture a point in time of an application’s data.
This can be problematic when the data is
spread across multiple storage units or medias.
They cannot guarantee that all the pieces are
captured at exactly the same time. One part of the
application may be captured at 9:00:01 whereas
another part is captured at 9:00:14. This scenario
invalidates the internal transactional consistency
that is embedded in database applications. Those
applications (e.g., IBM DB2, IBM Informix,
Microsoft Exchange, Microsoft SQL Server,
Oracle, Sybase, file systems, and others) usually
rely on other mechanisms such as hot backup or
online backup to allow the backups to be taken
while the application is still operationally active
and to produce a recoverable backup set that is
application consistent.
These external actions force the applications to
freeze momentarily while the backup is occurring.
This process is complicated and can be error prone,
with a negative impact on the performance of the
underlying application.
In contrast, most CDP implementations maintain
application consistency and coherency without
application disruption or operational complexity.
The CDP stored data can be migrated into other
protection domains (e.g., archival, tape, or optical
disc) without losing this attribute.
Another benefit of CDP is its ability to provide
nondestructively any-point-in-time virtual data
copies. Some CDP applications automatically test
and audit processes based on policy, on those
copies of the data, and mark these times as
significant or as valid points in time for recovery
if primary data is corrupted or infected with
malware (viruses, worms, and the like). Other CDP
applications allow the administrator to manually
write scripts to automate the same processes.
Not all CDP applications have this capability.
14
Prognosticating the Next Generation of Data Protection
The platform concept is called continuous data
technology (CDT) to distinguish it from CDP.
The CDT value comes from incredibly fast RTO
with known application consistency and coherency.
This in turn allows fast recovery from application
data corruptions, malware attacks, criminal data
destruction, and malicious employees.
Additionally, as the protected data is migrated from
online to nearline to offline, the application data
consistency and coherency are preserved for each of
those points in time.
3.	 CDP Protected Data Replicability
Data run through a CDT platform lends itself
well to replication (offsite disaster recovery) and
imaging—using data for alternate purposes like
testing, development, archiving, reporting, and
auditing. Some vendors are already exploiting these
areas, with distributed many-to-one for ROBO
(Asigra, CA/XOsoft, CommVault, FalconStor) and
one-to-one for data center to disaster recovery (DR)
site (EMC/Kashya, Symantec/Revivio). Many of the
other vendors have some version of it on their two-
to-three year roadmap.
There are numerous variations on how CDP can
be deployed. It can be part of the application itself,
or it can be a stand-alone product. Oracle 10g
Enterprise Edition has a feature called Flashback
that is an example of CDP technology integrated
into the application—the only current example of
such a deployment.
As a stand-alone product, CDP can be implemented
as host-based (software that must be installed
on the application host server), network-based
(the software runs on hardware that is connected
to the storage or Internet Protocol network,
independent of the application host and the
storage subsystems), or it can be a part of the
storage subsystem.
When implemented as a host-based solution, CDP
can be implemented via either operating system
device drivers (as system agents) or as a part of
the application. This is the approach of Asempra,
Atempo, CA/XOsoft, CommVault, EMC/Kashya,
FalconStor, FilesX (when not utilizing Cisco’s
intelligent switching module SSM), Iron Mountain,
SonicWALL, and TimeSpring.
Network implementations are the most flexible and
involve installing software on a high-functioning
storage network switch (Mendocino), using an
appliance to deploy into the storage network (EMC/
Kashya when utilizing Cisco’s intelligent switching
module SSM, FalconStor, Symantec/Revivio, and
SonicWALL), or using dedicated servers on the
Transmission Control Protocol/Internet Protocol
(TCP/IP) network (Asigra). The storage network
based solutions require a higher infrastructure cost,
particularly to companies that do not already have
these networks in place. Additionally, some of
these solutions (Asigra and Symantec/Revivio) can
accommodate multiple customer applications that
are related and must be protected and restored and
synchronized with one another. (Note: A few of the
host-based vendors have promised network versions
of their solutions in their roadmaps.)
Embedded storage subsystems do not exist, but
several vendors are working on this variation.
Prognosticating the Next Generation of Data Protection
15
In summary CDP makes a lot of sense for those
applications and data that require the absolute
minimum amount of data loss with the fastest
possible recovery. CDP is the best possible
data protection with the lowest RPO and RTO
while providing application data consistency and
coherency. It is the highest level of insurance for
data currently available on the market. And just like
insurance, there is a cost to CDP. CDP typically
utilizes 0.4 to 2.5 times the amount of disk that the
primary data requires, depending on how far back
in time the user wants to be able to restore. Plus,
there are the software, maintenance, and subscription
licensing fees. This leads to the conclusion that
CDP has a definitive role in data protection for the
organization’s mission-critical data that cannot be
lost under any circumstances.
Table 3 provides a brief noncomprehensive look at
those CDP products that are currently available.
Table 3: Select CDP Products on the Market
Source: SVB Alliant
CDP
Vendor
° Product
Asempra	
° Business 	
  Continuity 	
  Server
Asigra
° Televaulting 6.2
Atempo/
Storactive
° LiveBackup
CA/XOsoft
° Enterprise 	
  Rewinder
CommVault
° QiNetix 6.1
EMC/Kashya
° RecoverPoint
FalconStor
° CDP
FilesX
° XpressRestore 3.0
InMage
° DR-Scout VX
IBM/Tivoli
° CDP for files
Iron Mountain
° LiveVault CDP
Mendocino
° Recovery One	
° HP CIC
Symantec/Revivio
° CPS
SonicWALL
° CDP
Symantec/
VERITAS
° Backup 	
  Exec 10d 	
  Windows   	
  Server
TimeSpring
° Time Data
	
Agentless
Server(S)
Switch(Sw)
	
	
	
S
	
Yes
	
	
S
	
	
S
	
S
	
S or Sw
	
S
	
S
	
S
	
S
	
S
	
Sw	
Sw
	
Yes
	
S
	
	
	
	
	
S
	
S
De-
duplication
Local(L)
Global(G)
	
	
	
No
	
L/G
	
	
No
	
	
No
	
No
	
In-flight only
	
L
	
No
	
No
Std. 	
Compress
	
No
	
No	
No
	
No
Std. 	
Compress
	
	
	
	
	
No
	
No
Mail
Support
Exchange(E)
Notes(N)
GroupWise(G)
	
	
	
E
	
E
	
	
E
	
	
E
	
E
	
E
	
E
	
E	
E
	
No
	
E
	
E	
E
	
E
	
E
	
	
	
	
	
No
	
E
Database
Support
Oracle(O)
SQL(S)
DB2(D)
	
	
	
S
	
Future
	
	
S
	
	
O/S
	
S
	
O/S
	
O/S/D
	
S	
S
	
No
	
O/S
	
O/S/D	
O/S/D
	
O/S/D
	
S
	
	
	
	
	
No
	
S
	
Versioning
Multi-
generations
	
	
	
Yes
	
Yes
	
	
Yes
	
	
Yes
	
Yes
	
Yes
	
Yes
	
Yes	
Yes
	
Yes
	
Yes
	
Yes	
Yes
	
Yes
	
Yes
	
	
	
	
	
Yes
	
Yes
Design
Method
Block(B)
File(F)
	
	
	
B
	
B
	
	
App
	
	
F
	
F
	
B
	
B
	
F
	
B
	
F
	
F
	
B	
B
	
B
	
F
	
	
	
	
	
F
	
F
CDP
Capture
Write(W)
Time(T)
	
	
	
W
	
W
	
	
W
	
	
W
	
W
	
W
	
W
	
T
	
W
	
W
	
W
	
T/W	
T/W
	
T
	
W
	
	
	
	
	
W
	
W
Rollback
Basis
Time(T)
Event(E)
	
	
	
T/E
	
T/E
	
	
T/E
	
	
T/E
	
T/E
	
T/E
	
T/E
	
T/E
	
T/E
	
T
	
T
	
T/E	
T/E
	
T
	
T
	
	
	
	
	
T
	
T
Windows
Support
2000/
2003
	
	
	
Yes
	
Yes
	
	
Yes
	
	
Yes
	
Yes
	
Yes
	
Yes
	
Yes
	
Yes
	
Yes
	
Yes
	
Yes	
Yes
	
Yes
	
Yes
	
	
	
	
	
Yes
	
Yes
Linux
Support
Red
Hat(RH)
Suse(S)
	
	
	
No
	
RH/S
	
	
No
	
	
No
	
No
	
RH/S
	
RH/S
	
No
	
No
	
No
	
RH/S
	
RH/S	
RH/S
	
No
	
No
	
	
	
	
	
No
	
No
UNIX
Support
AIX(A)
Solaris(S)
HPUX(H)
Mac OSX(M)
	
	
	
No
	
A/S/H/M
	
	
No
	
	
No
	
No
	
A/S/H
	
A/S/H
	
No
	
No
	
No
	
S
	
A/S/H	
A/S/H
	
S
	
No
	
	
	
	
	
No
	
No
System
Protection
System
Files
	
	
	
Windows
	
All
	
	
Windows
	
	
No
	
No
	
All
	
All
	
Windows
	
Windows
	
No
	
No
	
No	
No
	
All
	
Windows
	
	
	
	
	
No
	
No
	
Scale
Local(L)
Ent.(E)
	
	
	
L
	
E
	
	
L
	
	
E
	
E
	
E
	
E
	
L
	
E
	
L
	
L
	
E	
E
	
E
	
L
	
	
	
	
	
L
	
L
	
Hardware
Std(S)
Intelligent
Switch(I)
Unique(U)
	
	
	
S
	
S
	
	
S
	
	
S
	
S
	
I/S/U
	
S
	
S
	
S
	
S
	
S
	
S/I	
S/I
	
U
	
U
	
	
	
	
	
S
	
S
	
Network
SAN(S)
TCP/IP(I)
	
	
	
I
	
I
	
	
I
	
	
I
	
I
	
S
	
S
	
I
	
I
	
I
	
I
	
S	
S
	
S
	
I
	
	
	
	
	
I
	
I
	
Encrypt
At-rest
(AR)
In-flight
(IF)
	
	
	
No
	
AR/IF
	
	
No
	
	
No
	
No
	
No
	
AR/IF
	
No
	
No
	
AR/IF
	
No
	
No	
No
	
No
	
AR
	
	
	
	
	
No
	
No
ROBO
Support
WAN
Optimized
	
	
	
No
	
Yes
	
	
No
	
	
No
	
No
	
Yes
	
Yes
	
No
	
No
	
No
	
No
	
No	
No
	
No
	
No
	
	
	
	
	
No
	
No
Replication
Offsite
Repository
	
	
	
No
	
Yes
	
	
No
	
	
Yes
	
Yes
	
Yes
	
Yes
	
Yes
	
Yes
	
Yes
	
Yes
	
Future	
Future
	
Yes
	
No
	
	
	
	
	
No
	
Yes
16
Prognosticating the Next Generation of Data Protection
continuous snapshot (a.k.a.
small aperture snapshot or
sas) near cdp or cdp-like
Continuous snapshot, or small aperture snapshot
(SAS), is often confused with CDP. In truth, it is
similar but is not the same. Continuous snapshot
provides coarse-grain RPO and an RTO similar to
the outside bounds of CDP (minutes to hours).
There are two primary differences between CDP and
SAS. First, snapshots are not continuous. There is a
time gap between snapshots, whereas CDP has no
time gap. This time gap ranges from minutes to days,
and this gap is the period of time in which data can
be lost between snapshots. Second, because the SAS
data capture has gaps between snapshot captures,
most products (Cloverleaf, Exanet, Microsoft,
Network Appliance, and Symantec/DCT) do not
have the same application consistency attributes as
CDP. This means they have to make the application
operations pause for the snapshot data to be in a
recoverable form.
Small aperture snapshots are, as a rule, incremental
over previous snapshots. This means they capture
only the changes between snapshots, providing similar
disk consumption attributes as CDP solutions.
LINUX
UNIX
WINDOWS
NO AGENTS
SNAP 1
° SNAP 2
° SNAP 3
° SNAP 4
° SNAP 5
° SNAP 6
° SNAP, ETC.
SUBSEQUENT SMALL
APERTURE SNAPS
NAS
STORAGE
Figure 5: Continuous Snapshot or SAS
Source: SVB Alliant
Prognosticating the Next Generation of Data Protection
17
Continuous Snapshot/SAS
Snapshot granularity
Max. number of snapshots
Max. file system volume size
Max. block volume size
Asynchronous remote replication
Consistency groups
Fast rollback restores to any snapshot
Exanet
Exastore
User defined (secs.)
No limits
1 Exabyte
NA
Yes
No
Yes
Network Appliance
V-Series/FAS/GX ONTAP®
Snapshot™
User defined (mins.)
255 per virtual volume
(up to 171,000 volumes)
16 TB
2 TB
Yes
Yes
Yes
Cloverleaf
iSN
User defined (mins.)
200,000 per iSN
16 TB
64 TB
Yes
Yes
Yes
VENDOR
PRODUCT
When snapshots are full volume and not incremental,
the amount of storage required grows large amazingly
fast. Each full-volume snapshot is the equivalent
size as the primary data. If the primary data takes
up as little as 1 terabyte (TB) of storage, continuous
snapshots with an RPO of 15 minutes would require
96 TB of storage for just a single day of snapshots.
This is not a financially practical solution for
most organizations, especially when compared with
the incremental snapshot and CDP approaches,
requiring a range of 0.4 to 2.5 times the amount of
disk space that the primary data requires.
One major advantage of continuous snapshot
over CDP is that it does not require any agents.
And, correspondingly, it does not have the costs
associated with agents.
Therearealimitednumberofvendorswithcontinuous
snapshot products on the market today (Cloverleaf,
Exanet, Network Appliance). Table 4 provides a brief
comparison of those continuous snapshot products
currently available.
Table 4: Select SAS Products on the Market
Source: SVB Alliant
18
Prognosticating the Next Generation of Data Protection
distributed remote office
branch office (robo)
backup to disk
Distributed ROBO backup to disk is the only current
data protection technology designed from the ground
up for a distributed environment. This means that
the data protection software recognizes ROBOs that
are connected to the central data center or DR site
over limited bandwidth wide area networks (WANs);
each ROBO has limited or no data protection
administration skills onsite, and users require local
recovery of their data without an administrator. ROBOs
must protect a wide variety of data, from operating
systems in servers to desktops and laptops that range
in size from a single user to hundreds of users.
Distributed ROBO backup to disk has been around
for years. Most organizations were not aware of its
existence because it had been primarily a service provider
technology. Just recently (over the past few years) has it
been licensable by end users.
Because of this, distributed ROBO backup to
disk must be more efficient than traditional
backup or replication technologies. Additionally,
it must be transparent to users with intuitive
data recovery, provide multiple data generations
(SAS), or provide the RPO/RTO granularity of
CDP. CDP has in fact been added as a feature to
several distributed ROBO backup-to-disk products
(Asigra Televaulting, CommVault QiNetix, and
Iron Mountain/LiveVault).
ROBO
CENTRAL DATA CENTER
ROMOTE
COLLECTOR
ROBO
ROMOTE
COLLECTOR
ROBO
ROMOTE
COLLECTOR
TCP/IP WAN
LAPTOPS
NASCENTRAL
BACKUP
SYSTEM
Figure 6: Distributed ROBO Backup to Disk
Source: SVB Alliant
Prognosticating the Next Generation of Data Protection
19
To provide the WAN efficiency so vital to
the distributed ROBO backup-to-disk value
proposition, it should include de-duplication
(locally and globally), WAN transmission of just
the changes or delta data, and compression of the
remaining data as it is transmitted across the WAN.
ROBO de-duplication eliminates transmission and
storage of duplicate files or data blocks. Several
distributed ROBO backup-to-disk vendors (Asigra,
EMC/Avamar, Symantec/DCT) have their own
de-duplication methodologies. The data block
methodology tends to be much more efficient
than the file methodology because it organizes
the protected data bytes into data blocks of a
specific standard length. It is these arbitrary data
blocks that de-duplication software is tracking.
Any block that repeats is neither transmitted nor
stored. The file methodology works with entire
files, which are much bigger than the arbitrary data
blocks. If the file or any part of the file changes,
the entire file is transmitted and stored. The typical
range of de-duplication data reduction is between
60 percent and 90 percent. Eliminating this much
data prior to transmission radically reduces
bandwidth requirements for ROBO data protection.
It also radically reduces the amount of storage
required for protected data.
The local de-duplication takes place at the ROBO
location. The global de-duplication takes place at
the central site data center or DR center. Global
de-duplication removes the duplicates among all of
the ROBO sites.
The next WAN efficiency gain comes from the
transmission of just the ROBO delta changes. Delta
change transmission means that only the files or delta
blocks that changed are sent across the WAN. This
has been a common feature for server replication and
traditional backup for years and has a more profound
positive effect on de-duplicated, distributed ROBO
to disk. De-duplication requires a meta data database
to keep track of all of the data blocks and/or files
that pass through the distributed ROBO backup-
to-disk software. It then removes the duplicates and
places a marker or stub in its place. When the data
block or file is recalled or recovered, that marker
lets the database know where to pull the stored data.
The meta data database is the engine that allows de-
duplication to take place. Unfortunately, it can also
limit the scalability of each image of the software. The
larger the database becomes—and it will continue to
get larger as more and more data is protected—the
slower it becomes. As the meta data database slows,
so do the backups and, more notable, the recoveries.
This inherent size limitation can mean that multiple
iterations of the software will be running in a
large enterprise environment. Delta changes to
a large extent relieve a big part of the effect of
size limitations. By reducing the backups to delta
changes, the de-duplication meta data database
has less data to sort and track on an ongoing basis.
Less data means ultimately greater performance,
scalability, and efficiency.
The third phase of the distributed ROBO backup-to-
disk efficiency comes from compression. This part can
be confusing for some IT managers, analysts, and
the trade press. De-duplication is often represented
as supercompression technology. When data is
de-duplicated, it does not have to traverse the WAN
at all; however, the rest of the data must still traverse
the WAN. Compression of that data can further
enhance the distributed ROBO backup-to-disk
software efficiency. Standard compression removes
the nulls in the data, with characteristic results being
approximately 2:1.
20
Prognosticating the Next Generation of Data Protection
Combining de-duplication (removing 60 to 90 percent
of the protected data that must traverse the WAN
and be stored on disk) with delta changes (reducing
the amount of data that requires de-duplication),
and compression (further reducing the amount of
data that must traverse the WAN and be stored on
disk) provides a net result of an incredibly efficient
data protection system for distributed ROBO.
Another factor that improves the speed of
recoverability (RTO) is that all backups, just like
CDP, are native to disk. They do not require tape,
tape libraries, or virtual tape; although, the data can
be migrated to tape as the protected data ages.
Distributed ROBO backup to disk is available
today from six vendors in various forms
(Asigra, EMC/Avamar, IBM/TSM, Iron Mountain,
Seagate/EVault, and Symantec). Table 5 provides
a brief non-comprehensive comparison of those
distributed ROBO backup-to-disk products that
are currently available.
Table 5: Select Distributed ROBO Backup-to-disk Products on the Market
Source: SVB Alliant
		 Vendor	 Asigra	 EMC/Avamar	 Iron Mountain	 Seagate/EVault	 Signiant	 Symantec
						 InfoStage
					 LiveVault	 ArcWare
				 AXIOM	 InSync	 Continuum
		 Product	 Televaulting	Replicator	 InControl	 Desktop 	RDP	 Pure Disk
Server (S) Desktop (D) Laptop (L) support
Windows	 	 	 S/D/L	 S/D/L	 S/D/L	 S/D/L	 S	 S	
Red Hat Linux	 	 	 S/D/L	 S/D/L	 S/D/L	 S/D/L	 S	 S	
Novell Suse Linux	 	 	 S/D/L	 No	 No	 No	 S	 S	
Novell NetWare	 	 	 S/D/L	 No	 No	 S/D/L 	 No	 No	
Mac OSX	 	 	 S/D/L	 No	 No	 No	 S	 No	
HP-UX	 	 	 S/D	 No	 No	 S/D 	 S	 No	
HP Tru-64 Unix	 	 	 S/D	 No	 No	 No	 No	 No	
SUN Solaris	 	 	 S/D	 S/D	 S/D	 S/D	 S	 No	
IBM AIX	 	 	 S/D	 S/D	 No	 S/D 	 No	 No	
EMC VMWare	 	 	 S/D	 No	 S/D	 No 	 No	 No	
IBM iSeries OS-400	 	 	 S	 No	 No	 S 	 No	 No
Mail Server and Database Support
Microsoft SQL Server	 	 	 Yes	 Yes	 Yes	 Yes	 Yes	 No	
Microsoft Exchange Server and Outlook 2000/2003	 	 Yes	 Yes	 Yes	 Yes	 Yes	 No	
Oracle 8 and above	 	 	 Yes	 Yes	 Yes	 Yes	 No	 No	
DB2	 	 	 Yes	 No	 No	 No	 No	 No	
MySQL	 	 	 Yes	 No	 Yes	 No 	 No	 No	
PostGresSQL	 	 	 Yes	 No	 Yes	 No 	 No	 No	
IBM Lotus Notes/Domino Server	 	 	 Yes	 No	 No	 No	 No	 No	
Novel Groupwise	 	 	 Yes	 No	 No	 No	 No	 No
Advanced Functionality
CDP	 	 	 Yes	 No	 Yes	 Yes	 No	 No	
Autonomic verification and healing (data cleansing)		 	 Yes	 Yes	 No	 No	 No	 No	
No host software required (agent)	 	 	 Yes	 No	 No	 No	 No	 No	
Local de-duplication (SIS)	 	 	 Yes	 Yes	 No	 No	 No	 Yes	
Global de-duplication (GSIS)	 	 	 Yes	 Yes	 No	 No	 No	 Yes	
Enterprise scalability	 	 	 Yes	 No	 Yes	 No 	 No	 No	
Bare metal restore	 	 	 Yes	 No	 No	 No	 No	 No	
Archival	 	 	 Yes	 No	 Yes	 Yes	 No	 No	
ROBO (local) recovery	 	 	 Yes	 Yes	 Yes	 Yes	 No	 Yes	
WAN Opt (compression,delta changes, other)	 	 	 Yes	 No	 No	 No	 Yes	 No	
Native backup to tape and tape libraries	 	 	 No	 No	 No	 No	 Yes	 No	
File search capabilities	 	 	 Yes	 Yes	 Yes	 Yes	 No	 Yes	
Single pane of glass management (Web portal)	 	 	 Yes	 Yes	 Yes	 Yes	 Yes	 No	
Secure encryption in-flight and at-rest	 	 	 Yes	 Yes	 No	 Yes 	 In-flight	 Yes
Distributed ROBO
Backup-to-disk Capabilities
Prognosticating the Next Generation of Data Protection
21
de-duplication — a.k.a. single
instance storage (sis) and global
single instance storage (gsis)
As discussed in the previous sections, there
are measurable benefits from de-duplication.
The methods of de-duplication are by files,
data blocks (arbitrary fixed lengths of data with
hash marks), blocklets (variable lengths of data
with hash marks), and byte. Files are coarse-
grain de-duplication. Data blocks are medium-
grain de-duplication. Blocklets are medium-to
fine-grain de-duplication. Bytes are very-fine-grain
de-duplication.
De-duplication is primarily being deployed for
secondary, replicated, or backup data. There are
some cases where it is starting to appear in primary
storage systems. As an enabling technology for
lowering storage costs (data at-rest) and WAN
costs (data in-flight), the de-duplication addressable
market has the potential to significantly grow
beyond the secondary data market. Tiered storage
environments are the most likely candidates to
deploy data de-duplication as data moves from
tier-one to tier-two storage. There is user fear that
de-duplication may delete unique data by mistake.
If that were to happen, customers would not know
about it until an application or a user attempted to
INCOMING DATA PREVIOUSLY SENT DATA NEW STORED DATA
Figure 7: De-duplication
Source: SVB Alliant
22
Prognosticating the Next Generation of Data Protection
access the protected, stored data. The chances of this
happening are remote. Nevertheless, one data loss
can have far-reaching financial consequences for an
organization in the era of compliance.
There are ways around this potential problem. One
method around this issue is to de-duplicate as a
background task (Sepaton) previously discussed in
the VTL section. Data is first stored in the storage
system on a hard disk used as a cache, then the data
is de-duplicated. The downside of this approach
(nothing is free) is that it requires a bit more capacity
to be used as a temporary cache to store all of the
data while de-duplication algorithms are applied.
The other method is to cleanse or heal the
data (Asigra and Quantum/ADIC/Rocksoft). This
method constantly checks the data for errors.
If an error is discovered, the de-duplication
algorithm has the application resend the data. This
method essentially cleanses the data and provides
autonomous self-healing of the de-duplicated data.
Both methodologies ensure that no unique data is
deleted accidentally.
De-duplication is a very important enabling
technology. As an enabler for data protection
applications, de-duplication can be an integrated
feature of the application or a supplement to
older backup and replication products that
cannot perform de-duplication themselves. This
allows those applications to gain de-duplication
benefits without having to rip out and replace an
organization’s current data protection software.
As with most supplemental or feature adjunct
markets, it is essentially just a bandage. As such,
this part of the de-duplication market has a limited
addressable market, as discussed in the context
of VTLs. It is unlikely to counter the open server
VTL market decline as backup to disk becomes
more pervasive.
Stand-alone de-duplication is available from a few
vendors (Data Domain, Diligent, ExaGrid, and,
to a more limited extent EMC/Avamar). Table 6
provides a brief comparison of those de-duplication
products currently available.
Continuous Snapshot/
Small Aperture Snapshot
Max. Capacity (Terabytes)
Max. Throughput in Terabytes/hr
WAN Replication Vaulting
Diligent
ProtecTIER
Appliance
1,000
2.9 (cluster of 4)
No
ExaGrid
ExaGrid Server
5
0.5
No
Data Domain
DD460G
233
4.6
Yes
VENDOR
PRODUCT
Table 6: Select De-duplication Products on the Market
Source: SVB Alliant
Prognosticating the Next Generation of Data Protection
23
security and encryption—
in-flight and at-rest
Security is similar to insurance. It is a cost center
and does not generate profits for the organization
deploying it. And just like insurance, not having
it and needing it can be financially devastating.
Security has rapidly moved from an IT back-
burner issue to a very high priority for everyone up
through executive management and even the board
of directors. Much of the priority shift can be tied
to the spate of high-profile data losses and thefts and
the subsequently tighter regulatory environment.
The majority of the press on the topic has been
about lost, unencrypted backup data containing
sensitive personal data valued for identity theft.
The increased regulations and legislation, as
discussed earlier, are specifying not just the levels
of data protection and security required but are also
specifying both financial and even criminal penalties
for compliance failures.
Security and encryption are a fundamental part
of data protection. Security means providing the
assurance that valuable, protected data will not be
taken away or accessed by unauthorized personnel
(see the SVB Alliant report “Securing Against the
Internal Threat,” August 2006). Many hackers and
criminal organizations have recognized protected data
as being highly valuable or it would not be protected.
TCP/IP WAN
FIREWALL
Figure 8: Security and Encryption
Source: SVB Alliant
24
Prognosticating the Next Generation of Data Protection
These hackers and criminal organizations are now
focusing their efforts on capturing this type of data
as well. Disgruntled employees will often apply their
malicious efforts to capture, corrupt, or destroy
protected data. This is because their efforts will
usually not be detected until the data has to be
recovered. Even then it is difficult to identify the
culprit without significant forensic analysis.
Encryption is no longer just a tape issue, and it is
not the only aspect of data storage security. Data
protection security must be part of the data life
cycle from creation to destruction.
IT generally views security as seven primary functions
when protecting data:
1.	 Intruder detection and prevention
2.	 Allowing authorized access while preventing
unauthorized access
3.	 Data in-flight (tapping) capture prevention
4.	 Data at-rest capture prevention
5.	 Audit trail of all access and changes
6.	 Guaranteed data change prevention
7.	 Digitally certified data destruction
1.	 Intruder Detection and Prevention
	 Intruder detection and prevention includes firewalls,
viral scanners, and tracing of external access to
the private networks. This is considered minimal
security and is offered by a wide variety of vendors.
2.	 Allowing Authorized Access While Preventing
Unauthorized Access
	 Authorized access makes sure that only those
who have the rights and the privileges to the data
can access it. This is usually managed through
access control lists (ACLs) or tokens. ACLs also
determine the level of data access. A corporate
administrator may have read-and-change access,
whereas a department administrator may have
only read access. ACL user authentication has
been driven by login and password. More recently,
user authentication has been moving towards
biometrics (fingerprints, voice prints, retinal
prints, and the like). Authorization security is
primarily found in server software (BitArmor,
EMC/RSA, Vormetric) and in some cases storage
networks (Brocade/McDATA, Cisco).
3.	 Data In-flight (Tapping) Capture Prevention
	 Preventing data capture while in-flight over WAN
starts with the virtual private network (VPN);
however, all TCP/IP networks can be tapped
or sniffed with nominal hardware and software.
This scenario requires encryption in-flight so
that even if the data is tapped, it is not readable.
Encryption for data in-flight can be located within
the application server (BitArmor, EMC/RSA,
Vormetric) or encryption appliances at the edge of
the WAN (Cipher Optics, Kaman).
4.	 Data At-rest Capture Prevention
	 Preventing data capture at-rest is the area in which
chief information officers are feeling the most heat.
Lost backup tapes, flash thumb drives, and laptops
are the most concerning. Even primary online data,
however, has increasingly become a concern. Once
again encryption comes into play. Encryption
for primary data may be too much for some
application performance requirements with the
additional latency of encrypting and unencrypting.
Encryption today is an absolutely mandatory
requirement for secondary data on tape, optical
Prognosticating the Next Generation of Data Protection
25
disc, or hard disk. Encryption for secondary data
(a.k.a. backed up or replicated data) can take place
in the data protection or on an appliance in front
of the tape drive/library, hard disk, or optical disc
(MaXXan, NeoScale, Network Appliance/Decru).
	 For primary application data, IT and application
administrators must make a decision about the
security risk versus the performance degradation.
These decisions will be coming for many enterprise
organizations on every application. Measuring the
security risk will be key. Determining how much
data is at risk as well as the value or negative
impact of the loss of that data, and evaluating if
non-encrypted security efforts will be enough,
will vary by application and organization.
	 Encryption at-rest can occur in the application
server with a SAN-based key management
appliance (BitArmor, EMC/RSA, Ingrian,
Vormetric), or the point-based SAN appliance
(MaXXan, NeoScale, Network Appliance/Decru).
	 The most important aspect of encryption at-rest
is the key management. Key management is not
an issue for encryption in-flight because the keys
are only temporary. It is a different story for data
encryption at-rest. Some data may have storage
requirements that span decades. Data retention
periods can range from a few years to 30 or more.
The Sarbanes-Oxley Act, for example, states
that companies must save electronic records and
messages (e-mail/instant messages) for at least five
years to ensure that auditors and other regulators
can easily obtain requested documents. The Basel
II Capital Accord requires banks to maintain three
to seven years of data history.
Toxic exposure — 30 years
Records for food (manufacturing, processing, packing) — 2 years after release
Records for drugs (manufacturing, processing, packing) — 3 years after distribution
Records for bio products (manufacturing, processing, packing) — 5 years after end of manufacture
Medical records — Hospital (either original or legally reproduced from)
Medical records for minors from birth to 21 possibly life
Medical records — 2 years after patient death
Financial Statements
Member registration for broker/dealers — end-of-life of enterprise
Customer account documents – end-of-account plus 6 years
Financial and correspondence data — 4 years after audit
1 2 3 4 5 10 YEAR 15 20 25+
Government OSHA
Life Sciences/
Pharmaceutical
(21 CFR Part 11)
Healthcare
HIPAA
Financial Services
(SEC 17(a)-4)
Sarbanes-Oxley
Figure 9: Record Retention Periods Mandated by Various Regulations in the United States
Source: Enterprise Storage Group and SVB Alliant
26
Prognosticating the Next Generation of Data Protection
	 The data encryption keys must be managed and
survive for the entire life of that data. If the
encryption keys are lost, the data is lost as well.
End users are demanding that key management
be brain-dead simple. This again comes down to
the length of time the encrypted data is stored.
There is a high probability that the personnel will
change during the time the data is stored. If the key
management is difficult and requires training, it is
likely the keys could be lost or made inaccessible.
The key management issue will make or break an
encryption-at-rest solution.
5.	 Audit Trail of All Access and Changes
	 This aspect of security data protection may seem
inherently obvious. It provides the forensic
capabilities of tracking users who attempt or
succeed at unauthorized access. This capability is
akin to video monitoring. It keeps a record in case
it is needed. Audit trail capability is considered a
must-have feature of any security implementation.
It is neither a separate product nor a market.
6.	 Guaranteed Data Change Prevention
	 This is a relatively new requirement for data
protection security. It can be traced to compliance
regulations, legislation, and e-discovery. It
is especially important for e-discovery, which
examines whether an original electronic document
has been altered in any way. The outcome of
litigation can hang in the balance based on
evidence that the original document has or has
not been altered.
	 This type of security data protection requires
what is known as a cryptographic hash function
to authenticate that the data has not changed.
This type of authentication is a feature that can be
found in content addressable storage also known
as CAS (Archivas, Caringo, EMC, Nexsan,
Permabit). CAS is a specially modified server file
system front ending direct attached storage, SAN
storage, or network attached storage (NAS).
	 Another methodology is to embed change security
protection directly into each application file system
(BitArmor, EMC/RSA).
7.	 Digitally Certified Data Destruction
	 Just as making sure that data has not been altered
is incredibly important, so too is making sure that
when data reaches its end of life it is thoroughly
and legally destroyed. This requires digital
certification that the data has been destroyed. This
covers not just the primary data but the secondary
or backup data as well.
	 Some innovative vendors have clever ways of
providing digitally certified data destruction. The
easiest way is to destroy the encryption keys. This
makes all copies of the encrypted data unreadable
and legally destroyed; however, it does not destroy
any unencrypted data. Nor does it destroy backup
data that may have copies of the keys.
	 Digitally certified data destruction is provided
within the overall security (Bit Armor, EMC/RSA,
Vormetric) or backup/replication solution (Asigra).
	 A much more detailed examination of the security
market, the vendors, and the solutions can be
found in the SVB Alliant report “Securing Against
the Internal Threat,” August 2006.
Prognosticating the Next Generation of Data Protection
27
market requirements
Successful products fulfill market requirements.
Market research identifies market requirements
while revealing unexpected information. Interviews
by Dragon Slayer Consulting with 178 enterprises
and small or medium-sized enterprises (SMEs)
over the first half of 2006 has brought to light new
information that contradicts the experts and much
of the conventional wisdom. SMEs are generally
defined as businesses under 500 employees. Figure
10 summarizes the results of the SME survey.
SME survey comments included:
°	 Tired of multiple management schemes for different
data protection products, especially those requiring
partnerships with other vendors and products.
°	 Willing to go with a less comprehensive data
protection scheme to reduce management
complexity.
°	 Looking for point-and-click simplicity that allows
assignment of protection levels and technologies
by application requirements versus infrastructure
limitations.
°	 Really want a standardized methodology in
determining the cost of application outages that
allows an easier time of assigning the most cost-
effective data protection scheme per application.
Dragon Slayer Consulting interviews of 63 small or
medium-sized businesses (SMBs) in the first half of
2006 produced somewhat different results. SMBs
are generally defined as businesses with more than
five and less than 100 employees.
Market Research
Want to lower the cost of data protection while raising the functionality
Prefer disk over tape backup for data protection if priced cost-effectively
E-discovery becoming important data protection driver and importance is growing
Prefer one data protection system to cover both the data center and the ROBO
Want data protection to guarantee that data is recoverable
Worried that current data protection is not effectively protecting ROBO and could be non-compliant
Want vendors to correct situation quickly and are exploring alternatives
Security is currently one of the highest priorities for data protection
Replication of primary data is an equally high priority
De-duplication can significantly reduce data protection WAN and storage cost
Demanding simpler, easier-to-use data protection technology
Demanding management of the data protection (even automatic) that requires no user training
Single management for all data protection (VTL, CDP, backup disk or tape, archival, security, RPO/RTO)
Would like to assign the level of data protection by application
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
90% 80% 70% 60% 50% 40% 30% 20% 10%
AGREE OR STRONGLY AGREE
Figure 10: 2006 U.S. SME Data Protection Survey
Source: Dragon Slayer Consulting
28
Prognosticating the Next Generation of Data Protection
Comments included:
°	 Simplifying data protection is important, and any
product that does gets a look.
°	 Really want a single product, single interface,
single training for all data protection needs.
what the numbers mean
First and foremost the enterprise and SME
markets are very similar, whereas the SMB market
is markedly different.
°	 The enterprise and the SME are willing to entertain
different levels of data protection and technologies
that are assignable by application. The SMB is for
the most part not that sophisticated and is looking
for a single product for all of its data protection
needs. Whereas the enterprise and the SME would
prefer a single product, they want a product that
incorporates sophisticated features and multiple
technologies that give them flexibility. The SMB is
quite happy to have one technology where all data
is treated equally.
°	 The appeal of tape is rapidly declining as a primary
data protection medium for the enterprise, SME,
and SMB as disk based solutions become more
cost-effective per gigabyte.
°	 Service providers have greater appeal to the SMB
than to the enterprise or SME. The enterprise and
the SME are more likely to offer data protection
as an internal chargeable service.
°	 Security is in the forefront of the minds of the
enterprise and the SME. Many are planning to
do something within the next 24 to 30 months.
10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
AGREE OR STRONGLY AGREE
Don’t differentiate application data
(treat all the same)
Prefer disk over tape backup for
data protection if priced cost-effectively
Lack data protection administration skills and
want brain-dead simple data protection
Service provider at right price would
be attractive alternative for data protection
Storage security not a high priority at
this time and foresee no change soon
Figure 11: 2006 U.S. SMB Data Protection Survey
Source: Dragon Slayer Consulting
Security for the SMB has not risen above the
firewall and virus-scanning levels and will not for
the foreseeable future.
°	 For the enterprise and the SME, ROBO data
protection is becoming a major issue that must
be resolved for compliance. It is a nonissue for
the SMB.
°	 All three markets are becoming much more cost
sensitive to their data protection solutions and are
demanding greater simplicity in implementation,
operations, and management.
Prognosticating the Next Generation of Data Protection
29
Predicting the future can be a precarious and risky
proposition; nevertheless, there are some logical
conclusions that can be drawn from past experience,
current trends, legislation, regulation, market
research, and educated guesswork.
1.	 Narrowly focused feature markets will
become part of more comprehensive data
protection products.
	 This has already started. CDP is a zero-cost feature
of Asigra’s Televaulting distributed backup product
(agentless). CDP is a feature of CommVault’s
QiNetix product suite (utilizing the same agent
and database as its backup, replication, storage
resource management, and software monitoring).
CDP is also a feature of Symantec’s Backup
Exec 10d product line (utilizing the same agent
and database as the backup software). Security
in the form of ACLs, encryption in-flight, and
encryption at-rest now appear as options in data
protection software from Atempo, Asigra (no
charge), EMC/Legato, EVault, IBM/TSM, Iron
Mountain, and Symantec. We believe that point
solution vendors will continue to be bought by
larger players and do not have large enough end
markets to survive on their own.
2.	 Some Band-Aid product markets will most
likely fade away or become insignificant.
	 Despite some industry analyst views on VTLs’
becoming a multibillion-dollar market, and while
we recognize that the tape market is huge and that
it will take companies years to transition from
tape to disk, we question the viability of the stand-
alone VTL vendor. We believe that VTLs need
to be part of a more broad-based platform. There
are a couple of reasons why we take this position.
First, we believe that storage vendors with large
installed bases of tape customers will want to
protect their turf by providing multiple backup
and archive solutions beyond just partnerships.
Second, because we believe that the open server
VTL market appears to be one of those in which
the value proposition declines over time—the
exception being the mixed open server and
mainframe VTL markets—a stand-alone company
will need to quickly innovate, add new products
or re-create itself.
3.	 Security will become table stakes in enterprise
and SME data protection products.
	 EMC’s acquisition of RSA, Network Appliance’s
acquisition of Decru, Symantec’s acquisition of
VERITAS, and the increasing number of data
protection products with integrated security are
indicative of the increasing importance of security.
4.	 Enterprises, SMEs, and SMBs will require all
data protection products to become more
sophisticated and automated.
	 Sophisticated does not mean more complicated to
the user. It generally means less complicated and
easier to use while providing more functionality
with increased automation. An example of this
automation will be the calculation of application
outage costs. Determining the real organizational
cost per application outage is very important in
figuring out the level of data protection that is
required for that application. Although it is a
simple standardized cost formula (see next page),
most end users do not know how to calculate an
application outage cost to determine RPO, RTO,
and technology to utilize.
NGDP Direction over the
Next Three to Five Years
30
Prognosticating the Next Generation of Data Protection
	 Cost per Application Outage = (RPO + RTO) ×
(HR + LR) × Length of Outage in hours
°	 RPO = recovery point objective, or the amount of
data that can be lost per application per outage
°	 RTO = recovery time objective, or the time it takes to
be back in operation per application per outage
°	 HR = lost worker productivity per hour of downtime,
or the cost per hour of the nonproductive worker
°	 LR = lost revenue per hour of downtime
	 Calculating the cost per application outage
must become an easy automated tool for end
users, enabling them to make knowledgeable,
informed decisions and shortening the vendors’
sales cycles.
5.	 ROBOwillbeanintegralpartofdataprotection
products.
	 It will no longer be financially practical to relegate
the ROBO data to the same practices as data
center data because of the increasing costs and
the decreasing remote skills. This means that
data protection products must take into account
ROBO issues, including bandwidth and once
again security. With numerous emerging ROBO
products encroaching on traditional data center
functionality,weanticipateafuturewhereproducts
do both; we expect large vendors to seek growth
in the ROBO market and it will become a make-
versus-buy decision.
6.	 E-discovery will push more data online vs.
offline increasing the decline of tape.
	 Time is money—and never more so than with
attorneys. The United States and Europe are
litigious societies, with litigation increasing every
year. It is logical to conclude that protected and
archival data that has traditionally been offline will
become increasingly online (on disk versus tape).
And it will have extensive fast-search capabilities.
7.	 The majority of data protection products will
be disk based with tape, being used primarily
for long-term archival storage.
	 Disk is faster for all types of recoveries, such as
CDP. The increasing requirements for speed of
recovery, convenience, lower cost of media, and
de-duplication will drive this trend.
8.	 De-duplication will become integral to data
protection products and not an add-on.
	 De-duplication is already part of the distributed
ROBO backup offerings from Asigra, EMC/
Avamar, and Symantec. The hardware costs,
savings, and bandwidth reduction for ROBO
are very compelling and will become table
stakes within the next three to five years. De-
duplication is also part of the VTL offerings
from Data Domain, Diligent, and FalconStor
(COPAN, EMC, HP, IBM) with Quantum/
ADIC/Rocksoft planning a wide variety of D2D
products leveraging its patented Rocksoft blocklet
de-duplication technology. Those who don’t have
de-duplication have at a long-term disadvantage
and be potentially noncompetitive.
Prognosticating the Next Generation of Data Protection
31
The data protection vendors will have to develop
and integrate this broadening base of functionality
by themselves, acquire it, or be acquired. This is a
classic build-versus-buy decision.
Conventional wisdom is that if the vendor cannot
develop all of the functionality required by the
market, it should consider strategic partnerships.
That has been the historic roadmap, and the logic
is that if it worked in the past, it should work
in the present. Like most conventional wisdom,
it is correct for the past and not necessarily the
present or future. NGDP strategic partnering
requires multiple licenses, multiple management
touch points or interfaces, multiple maintenance/
subscription agreements, multiple agents (each using
a couple of percentage points of CPU processing
power), broader administration technical skills, and
much more training. This is contradictory to the
IT organization’s intensifying trend of reduced
skills, doing more with less, increasing responsibility
with fewer people, and a focused effort on making
things easier.
We believe that users are going to demand that
vendors provide a suite versus a point product for
their NGDP solutions. Therefore, the most viable
alternative is for the NGDP market to consolidate.
Failure to keep up with market requirements is a
recipe for failure and will relegate many current
and leading vendors to historical footnotes. As with
most maturing markets, there is a high likelihood of
increased MA activity. Momentum and acceleration
occurring over the next three to five years as the
market leaders faced with the innovator’s dilemma
have difficulty offering NGDP market requirements
and it becomes increasingly difficult for startup
vendors to offer enough to break out from the pack.
Conclusions
32
Prognosticating the Next Generation of Data Protection
Table7:SelectStorageIndustryMATransactions
DataasofJanuary10,2007
Transactionvaluereflectsadjustmentforcashanddebtfrompurchaseprice.
Sources:CapitalIQ,The451Group,SVBAlliant,companypressreleases
andWebsites,andmiscellaneousnewsarticles
LTM	Last12months
N/A	Notapplicable
LEGEND
Announce
DateAcquirerTargetTargetBusinessDescriptionSubsector
Transaction
Value
Target
LTM
Revenue
Revenue
Multiple
12/21/06SeagateEvaultProvidescontinuousdataprotectionandrecoverysoftwareCDP$185.0$35.05.3x
11/27/06SymantecRevivioProvidesenterprise-classcontinuousdataprotectionandrecoverysoftwareCDPN/AN/AN/A
11/08/06NetworkApplianceTopioProvidesdisasterrecoverydatareplicationandmigrationsoftwareReplication160.07.521.3x
11/01/06EMCAvamarDevelopsenterprisedataprotectionsoftwaresolutionsDe-duplication/	
ROBO
165.022.07.5x
10/25/06LSILogicStoreAgeProvidesSANstoragemanagementandadvanced,multi-tiered,dataprotectionsolutionsCDP50.0~5.010.0x
09/25/06GlobalSCAPEAvailProvidessoftwaresolutionstodeliverremote-sitefilesharing,continuousdataanddatabasebackup,and
acceleration
SAS/backup9.7N/AN/A
07/11/06CAXOsoftProvidescontinuousapplicationandinformationsolutionsCDP80.015.05.3x
05/09/06EMCKashyaProvidesdisasterrecovery,continuousremotereplication,andcontinuousdataprotectionsolutions	
acrossvariousstorageareanetworkingenvironments
Replication153.06.025.5x
05/02/06QuantumADICSupplierofautomatedtapesystems,datamanagementsoftware,storagenetworkingappliances,disk-
basedbackupandrestoresolutions
Back-up513.0472.81.1x
04/12/06CrossroadsSystemsTapeLaboratoriesSolutionsincludedatacompression,tapearrays,interfaceemulation,anddisk-basedvirtualtapesolutionsVTLN/AN/AN/A
03/15/06ADICRocksoftProvidesredundant-dataeliminationtechnologiesandproducts,includingblocklets,andasoftware	
developmentkit
De-duplication65.0Pre-	
Revenue
N/A
03/06/06AtempoStoractiveProvidescontentdataprotectionsoftwareforMicrosoftWindows;solutionsincludeLiveBackup,a	
client/server,real-timebackupsoftwaretodeliverautomaticdatabackup,andend-userfilerecovery
CDPN/A5.0N/A
12/01/05IronMountainLiveVaultProvidesdisk-to-diskbackupanddatarecoveryservices;offersLiveVaultInSync,atape-freeserverbackup
andrecoveryservicethatprovidesautomaticbackup,offsitedatastorage,restoredata,andprotectionof
opendatabasesandfiles
Backup/	
recovery
50.010.05.0x
11/21/05SonicWALLLassoLogicProvidesopticaldiscdataprotectionsystems;includingacontinuousdataprotectionappliance,anda	
tapebackupreplacementsolution;additionally,providesreal-timecontinuousdataprotectionforservers,	
laptops,andpersonalcomputerslocallyandoffsite
CDP20.02.010.0x
11/16/05BakBoneSoftwareConstantDataProvidesdataintegration,protection,andstoragemanagement;alsodevelopsdatareplicationand
clusteringsoftwareforheterogeneousserverandoperatingenvironments
Replication5.5N/AN/A
08/17/05SeagateMirraNetworkeddigitalcontentprotectionandbackupproducts;offersMirraPersonalServer,apersonal
computerbackupsystemtobackup,access,andsharedigitalfiles
Backup15.0N/AN/A
08/08/05OverlandStorageZettaSystemsOffersZettaServerIR,whichvirtualizesthephysicalstorageattachedtoitandpresentsitasblocklevel;	
andZettaServerNAS,aWeb-baseduserinterfaceusedtoaccessthesnapshotfilesthroughhiddenfolders
SAS9.0Pre-	
revenue
N/A
04/20/05VERITASDataCenterTechnologiesProvidesremotebackupapplicationsviaacontentaddressablestorageengineDatareduction60.01/240.0x
04/07/05NetworkApplianceAlacritusSoftwareVirtualtapelibraryandCDPbackupsoftware;offeringsincludecontinuousdataprotectiontechnology	
andopensystemsvirtualtapelibraryappliancesoftware
VTL/CDP11.0N/AN/A
10/12/04EMCDantzDevelopmentProductsprotectcomputersbyprovidingbackupandrecoveryforfileservers,desktops,notebooks,and
business-criticalapplicationsforSMBs
Backup/	
recovery
45.0N/AN/A
10/12/04IronMountainConnectedCorporationArchiving,dataprotection,servicesandsoftwarethatsupportpersonalcomputers,overtheInternetand	
oncorporateintranets
Backup	
services
117.036.03.3x
03/30/04MendocinoSoftwareVyantTechnologiesOffersincludedatastorage,backup,replicationandprofessionalservicesDataprotection/
replication
N/AN/AN/A
Mean12.2x
Median7.5x
($inmillions,exceptpersharevalue)
Prognosticating the Next Generation of Data Protection
33
TradingPerformanceFullyDilutedEnterpriseValueMultiples
Current
Price
1/10/07
52Week
High
52Week
Low
LTMPrice
Change
Equity
Market
Value
Enterprise
Value
EV/
Revenue
CY05A
EV/
EBITA
CY05A
EV/
Earnings
CY05A
Revenue
CY05A
Cash
CY05A
Debt
CY05A
BakBoneSoftware$1.40$2.90$0.95(12%)$      103$        84N/AN/AN/AN/A$      18.7$       0.0
Brocade8.549.424.12992,3391,7973.1x18.3x72.4x582.6542.10.0
CA24.8429.5018.97(14)13,22214,5153.8x13.3x59.0x3,804.01,295.02,588.0
CommVault19.2920.7414.74N/A8868468.1x69.8x79.0x104.350.210.0
EMC14.3614.759.44832,36531,8953.3x14.4x28.1x9,664.02,669.52,200.0
FalconStor8.239.805.9904243859.4x73.9xN/M41.039.00.0
HP42.2042.3929.0037122,253111,0261.3x14.3x41.4x87,901.016,422.05,195.0
IBM98.89100.3372.7318153,873164,9631.8x9.5x20.8x91,134.010,901.021,991.0
IronMountain27.3129.9122.64(2)5,4248,0133.9x14.1x72.1x2,078.245.42,634.8
LSILogic9.4411.817.4123,8083,1621.6x11.2xN/M1,919.21,268.1622.0
Microsoft29.6630.2621.4610308,909280,6576.8x15.6x21.5x41,359.028,252.00.0
NetworkAppliance39.7341.5625.853915,92414,7387.7x38.4x54.5x1,920.31,379.4193.4
OverlandStorage4.3411.323.63(55)56100.0xN/MN/M232.845.80.0
Quantum2.374.021.90(24)4609661.1x23.5xN/M868.6150.9656.5
Seagate26.8128.1119.151515,85515,6701.8x9.9x14.7x8,536.02,653.02,468.0
SunMicrosystems6.006.253.743321,36717,9801.5x28.8xN//M11,664.03,971.0584.0
Symantec21.4622.1914.781220,42219,5685.4x16.6xN/M3,617.52,954.22,100.0
Median3.2x15.6x47.9x
Mean3.8x24.8x46.3x
DataasofJanuary10,2007
Sources:CapitalIQ,The451Group,SVBAlliant,companypressreleasesandWebsites,andmiscellaneousnewsarticles
A	Annual
CY	Calendaryear
EBITDA	Earningsbeforeinterest,depreciation,amortization,andtaxes
EV	Marketcapitalizationlesscashandlongtermdebt
LTM	Last12months
N/A	Notapplicable
N/M	Notmeaningful
LEGEND
Table8:SelectStorageIndustryCompanyComparables($inmillions,exceptpersharevalue)
34
Prognosticating the Next Generation of Data Protection
° ACLs	 access control lists
° bots	 servers that crawl the Internet looking for content 	
	 of interest based on the search parameters; they 	
	 can search for security flaws in a system and then 	
	 catalog and report them, or exploit them
° CAS	 content addressable storage
° CDP	 continuous data protection
° CDT	 continuous data technology
° CFR	 Code of Federal Regulations
° CPU	 central processing unit
° D2D	 disk-to-disk
° DR	 disaster recovery
° e-discovery	 electronic data discovery
° EU	 European Union
° EU DPD	 European Union Data Protection Directive
° FICON	 fiber connectivity
° GSIS	 global single instance storage
° GLBA	 U.S. Gramm-Leach-Bliley Act
° HDD	 hard disk drive
° HIPAA	 Health Insurance Portability and Accountability Act
° HR	 lost worker productivity per hour of downtime
° ILM	 information life cycle management
° IM	 instant messaging
° malware	 viruses, worms, bots, keystroke mappers
° NAS	 network attached storage
° nearline storage	 user and application accessible storage with lower	
	 performance than online
° NGDP	 next generation data protection
° OSHA	 Occupational Safety and Health Administration
° PC	 personal computer
° ROBO	 remote office branch office
° RPO	 recovery point objective
° RTO	 recovery time objective
° SAN	 storage area network
° SATA	 serial advanced technology attachment
° SAS	 small aperture snapshot
° SIS	 single instance storage
° SMB	 small or medium-sized business
° SME	 small or medium-sized enterprise
° SNIA	 Storage Network Industry Association
° TB	 terabyte
° TCP/IP	 Transmission Control Protocol/Internet Protocol
° VPN	 virtual private network
° VTL	 virtual tape library
° WAN	 wide area network
Table 10: Acronyms and Abbreviations
Source: Dragon Slayer Consulting
2006 SME Survey Results
178 surveyed	
	
	
	
Issue
Would like to assign the level of data
protection by application
Single mgmt all data protection (VTL, CDP,
backup disk or tape, archival, security, RPO/
RTO, etc.)
Intuitive management of the data protection
even automatic that requires no user training
Demanding simpler, easier to use data
protection technology
De-duplication can significantly reduce data
protection WAN and storage cost
Replication of primary data an equally high
priority
Security is currently one of their highest
priorities for data protection
Want their vendors to quickly correct situation
quickly  are exploring alternatives
Worried current data protection not
effectively protecting ROBO  could 	
non-compliant
Want their data protection to guarantee data
is recoverability
Prefer one data protection system to cover
both the data center and the ROBO
e-discovery becoming important data
protection driver  importance growing
Prefer disk over tape backup for data
protection if priced cost effectively
Want to lower the cost of data protection
while raising the functionality
2006 SMB Survey Results
63 surveyed
Issue
Storage security not high priority at this time
 foresee no change soon
Service provider at right price would be
attractive alternative for data protection
Lack data protection admin skills  want
brain-dead simple data protection
Prefer disk over tape backup for data
protection if priced cost effectively
Don't differentiate application data (treat all
the same)
Strongly
Agreed
or
Agreed
	
87
	
	
87
	
94
	
94
	
94
	
101
	
119
	
123
	
	
123
	
130
	
139
	
139
	
144
	
144
Strongly
Agreed
or
Agreed
	
43
	
47
	
56
	
57
	
59
	
	
	
	
	
Percentage
	
49%
	
	
49%
	
53%
	
53%
	
53%
	
57%
	
67%
	
69%
	
	
69%
	
73%
	
78%
	
78%
	
81%
	
81%
	
	
	
	
	
Percentage
	
68%
	
75%
	
89%
	
90%
	
94%
Table 9: Interview Tables
Prognosticating the Next Generation of Data Protection
35
Select SVB Alliant Transactions
has been
acquired by
November 2005
has been
acquired by
January 2005
July 2006
has been
acquired by
has been
acquired by
June 2005
has been
acquired by
April 2005
VERITAS
(now Symantec)
October 2006
has been
acquired by
October 2006
has been
acquired by
November 2006
has been
acquired by
has been
acquired by
April 2001
$13,000,000
Series B
Preferred Stock
December 2005
August 2005
has been
acquired by
$12,000,000
Series D
Preferred Stock
May 2006
36
Prognosticating the Next Generation of Data Protection
SVB Alliant is an investment banking firm providing
MA and private capital advisory services to
technology and life science companies. SVB Alliant’s
expertise spans the technology landscape, with
deep subject-matter and execution experience in
semiconductors, communications, storage, security,
networking, peripherals and capital equipment, the
Internet, software and services, and life sciences.
The firm has offices in Palo Alto, California, and
Boston and an affiliate in London. SVB Alliant
is a member of global financial services firm SVB
Financial Group, with SVB Silicon Valley Bank,
SVB Analytics, SVB Capital, SVB Global, and
SVB Private Client Services. Additional information
is available at www.svballiant.com
About SVB Alliant
If you would like more information about
the NGDP market, please contact:
Rick Dalton
SVB Alliant
Phone: 650.330.3799
E-mail: rdalton@svballiant.com
Melody Jones
SVB Alliant
Phone: 650.330.3076
E-mail: mjones@svballiant.com
Contact
Information
References
1.	 Financial Services Modernization Act, Gramm-Leach-Bliley, United States Senate,
	 http://www.cdt.org/privacy/eudirective/EU_Directive_.html.
2.	 European Union Data Protective Directive, European Union, http://banking.senate.gov/conf/grmleach.htm.
SVB Alliant, as part of its business, is regularly engaged in providing MA and private placement advisory services to technology and life science companies.	
We may have in the past and may currently or in the future provide such services for a transaction-based fee to one or more of the companies mentioned	
in this piece.
These securities offered have not been registered under the U.S. Securities Act of 1933, as amended, and may not be offered or sold in the United States absent
registration or an applicable exemption from registration requirements.
This material, including without limitation the statistical information herein, is provided for informational purposes only. The material, including all forward-looking
projections, is based in part on information from third-party sources that we believe to be reliable, but neither the material nor the sources have been independently
verified by us. As a result,  we do not represent that the information is accurate or complete. Nothing relating to the material should be interpreted as a recommendation
or solicitation or offer to buy or sell the securities of the companies mentioned herein.
SVB Alliant is a wholly owned broker-dealer subsidiary of SVB Financial Group, the parent company of Silicon Valley Bank. Member NASD/SIPC. SVB Alliant’s
services are not bank products or services. The services of SVB Alliant are not guaranteed by the bank, are not FDIC-insured and may lose value.
SVB Alliant Europe Ltd. is registered in England and Wales at 34 Dover Street, London, W1S 4NG, U.K. under No. 5089363 and is authorised and regulated by
the Financial Services Authority.
All material presented, unless specifically indicated otherwise, is under copyright to SVB Alliant and its affiliates and is for informational purposes only. None of
the material, nor its content, nor any copy of it, may be altered in any way, transmitted to, copied, or distributed to any other party without the prior express written
permission of SVB Alliant. All trademarks, service marks, and logos used in this material are trademarks, service marks, or registered trademarks of SVB Alliant or
one of its affiliates or other entities.
© 2006 SVB Alliant
SM
. All rights reserved.
SVB Alliant
Headquarters
181 Lytton Avenue
Palo Alto, California 94301
U.S.
SVB Alliant
Boston
2221 Washington Street
One Newton Executive Park, Suite 200
Newton, Massachusetts 02462
U.S.
SVB Alliant Europe Ltd.
London
34 Dover Street
5th Floor
London W1S 4NG
U.K.

More Related Content

What's hot

IMAGE STEGANOGRAPHY JAVA PROJECT SYNOPSIS
IMAGE STEGANOGRAPHY JAVA PROJECT SYNOPSISIMAGE STEGANOGRAPHY JAVA PROJECT SYNOPSIS
IMAGE STEGANOGRAPHY JAVA PROJECT SYNOPSISShivam Porwal
 
Fractal Image Compression Using Quadtree Decomposition
Fractal Image Compression Using Quadtree DecompositionFractal Image Compression Using Quadtree Decomposition
Fractal Image Compression Using Quadtree DecompositionHarshit Varshney
 
Secure Hash Algorithm (SHA-512)
Secure Hash Algorithm (SHA-512)Secure Hash Algorithm (SHA-512)
Secure Hash Algorithm (SHA-512)DUET
 
AES KEY EXPANSION .pptx
AES KEY EXPANSION .pptxAES KEY EXPANSION .pptx
AES KEY EXPANSION .pptxAhmudulHassan
 
Steganography Project
Steganography Project Steganography Project
Steganography Project Jitu Choudhary
 
MACs based on Hash Functions, MACs based on Block Ciphers
MACs based on Hash Functions, MACs based on Block CiphersMACs based on Hash Functions, MACs based on Block Ciphers
MACs based on Hash Functions, MACs based on Block CiphersMaitree Patel
 
The Mathematics Behind Bitcoin
The Mathematics Behind BitcoinThe Mathematics Behind Bitcoin
The Mathematics Behind BitcoinCyril Grunspan
 
Double DES & Triple DES
Double DES & Triple DESDouble DES & Triple DES
Double DES & Triple DESHemant Sharma
 
Network security & cryptography
Network security & cryptographyNetwork security & cryptography
Network security & cryptographyRahulprasad Yadav
 
Spatial Filters (Digital Image Processing)
Spatial Filters (Digital Image Processing)Spatial Filters (Digital Image Processing)
Spatial Filters (Digital Image Processing)Kalyan Acharjya
 
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...Edureka!
 
Fractional knapsack class 13
Fractional knapsack class 13Fractional knapsack class 13
Fractional knapsack class 13Kumar
 
Divide and conquer - Quick sort
Divide and conquer - Quick sortDivide and conquer - Quick sort
Divide and conquer - Quick sortMadhu Bala
 
Overview on Cryptography and Network Security
Overview on Cryptography and Network SecurityOverview on Cryptography and Network Security
Overview on Cryptography and Network SecurityDr. Rupa Ch
 

What's hot (20)

IMAGE STEGANOGRAPHY JAVA PROJECT SYNOPSIS
IMAGE STEGANOGRAPHY JAVA PROJECT SYNOPSISIMAGE STEGANOGRAPHY JAVA PROJECT SYNOPSIS
IMAGE STEGANOGRAPHY JAVA PROJECT SYNOPSIS
 
Fractal Image Compression Using Quadtree Decomposition
Fractal Image Compression Using Quadtree DecompositionFractal Image Compression Using Quadtree Decomposition
Fractal Image Compression Using Quadtree Decomposition
 
Signal
SignalSignal
Signal
 
Secure Hash Algorithm (SHA-512)
Secure Hash Algorithm (SHA-512)Secure Hash Algorithm (SHA-512)
Secure Hash Algorithm (SHA-512)
 
AES KEY EXPANSION .pptx
AES KEY EXPANSION .pptxAES KEY EXPANSION .pptx
AES KEY EXPANSION .pptx
 
Digital image processing
Digital image processingDigital image processing
Digital image processing
 
Steganography Project
Steganography Project Steganography Project
Steganography Project
 
MACs based on Hash Functions, MACs based on Block Ciphers
MACs based on Hash Functions, MACs based on Block CiphersMACs based on Hash Functions, MACs based on Block Ciphers
MACs based on Hash Functions, MACs based on Block Ciphers
 
The Mathematics Behind Bitcoin
The Mathematics Behind BitcoinThe Mathematics Behind Bitcoin
The Mathematics Behind Bitcoin
 
Double DES & Triple DES
Double DES & Triple DESDouble DES & Triple DES
Double DES & Triple DES
 
Network security & cryptography
Network security & cryptographyNetwork security & cryptography
Network security & cryptography
 
Collision in Hashing.pptx
Collision in Hashing.pptxCollision in Hashing.pptx
Collision in Hashing.pptx
 
Spatial Filters (Digital Image Processing)
Spatial Filters (Digital Image Processing)Spatial Filters (Digital Image Processing)
Spatial Filters (Digital Image Processing)
 
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...
 
Fractional knapsack class 13
Fractional knapsack class 13Fractional knapsack class 13
Fractional knapsack class 13
 
N queen problem
N queen problemN queen problem
N queen problem
 
Uml Common Mechanism
Uml Common MechanismUml Common Mechanism
Uml Common Mechanism
 
Run length encoding
Run length encodingRun length encoding
Run length encoding
 
Divide and conquer - Quick sort
Divide and conquer - Quick sortDivide and conquer - Quick sort
Divide and conquer - Quick sort
 
Overview on Cryptography and Network Security
Overview on Cryptography and Network SecurityOverview on Cryptography and Network Security
Overview on Cryptography and Network Security
 

Viewers also liked

Yair Hershko - Building Software Defined Storage Cloud Using OpenStack
Yair Hershko - Building Software Defined Storage Cloud Using OpenStackYair Hershko - Building Software Defined Storage Cloud Using OpenStack
Yair Hershko - Building Software Defined Storage Cloud Using OpenStackCloud Native Day Tel Aviv
 
Network Attached Storage (NAS) Initiative
Network Attached Storage (NAS) Initiative Network Attached Storage (NAS) Initiative
Network Attached Storage (NAS) Initiative Gary Wilhelm
 
Comparing Dell Compellent network-attached storage to an industry-leading NAS...
Comparing Dell Compellent network-attached storage to an industry-leading NAS...Comparing Dell Compellent network-attached storage to an industry-leading NAS...
Comparing Dell Compellent network-attached storage to an industry-leading NAS...Principled Technologies
 
Storage 101 for VMware admins 2015
Storage 101 for VMware admins 2015Storage 101 for VMware admins 2015
Storage 101 for VMware admins 2015Benjamin Troch
 
CDW: SAN vs. NAS
CDW: SAN vs. NASCDW: SAN vs. NAS
CDW: SAN vs. NASSpiceworks
 
Understanding nas (network attached storage)
Understanding nas (network attached storage)Understanding nas (network attached storage)
Understanding nas (network attached storage)sagaroceanic11
 
NAS - Network Attached Storage
NAS - Network Attached StorageNAS - Network Attached Storage
NAS - Network Attached StorageShashank Bhatnagar
 
Network Attached Storage (NAS)
Network Attached Storage (NAS)Network Attached Storage (NAS)
Network Attached Storage (NAS)sandeepgodfather
 

Viewers also liked (13)

Nas Ashok1
Nas Ashok1Nas Ashok1
Nas Ashok1
 
Yair Hershko - Building Software Defined Storage Cloud Using OpenStack
Yair Hershko - Building Software Defined Storage Cloud Using OpenStackYair Hershko - Building Software Defined Storage Cloud Using OpenStack
Yair Hershko - Building Software Defined Storage Cloud Using OpenStack
 
Network Attached Storage (NAS) Initiative
Network Attached Storage (NAS) Initiative Network Attached Storage (NAS) Initiative
Network Attached Storage (NAS) Initiative
 
Comparing Dell Compellent network-attached storage to an industry-leading NAS...
Comparing Dell Compellent network-attached storage to an industry-leading NAS...Comparing Dell Compellent network-attached storage to an industry-leading NAS...
Comparing Dell Compellent network-attached storage to an industry-leading NAS...
 
Nas101
Nas101Nas101
Nas101
 
Network attached storage using raspberry pi
Network attached storage using raspberry piNetwork attached storage using raspberry pi
Network attached storage using raspberry pi
 
Storage 101 for VMware admins 2015
Storage 101 for VMware admins 2015Storage 101 for VMware admins 2015
Storage 101 for VMware admins 2015
 
CDW: SAN vs. NAS
CDW: SAN vs. NASCDW: SAN vs. NAS
CDW: SAN vs. NAS
 
Understanding nas (network attached storage)
Understanding nas (network attached storage)Understanding nas (network attached storage)
Understanding nas (network attached storage)
 
NAS - Network Attached Storage
NAS - Network Attached StorageNAS - Network Attached Storage
NAS - Network Attached Storage
 
Network Attached Storage (NAS)
Network Attached Storage (NAS)Network Attached Storage (NAS)
Network Attached Storage (NAS)
 
NetApp & Storage fundamentals
NetApp & Storage fundamentalsNetApp & Storage fundamentals
NetApp & Storage fundamentals
 
Netapp Storage
Netapp StorageNetapp Storage
Netapp Storage
 

Similar to Next Generation Data Protection

About Scott Goldstein: MortgageOrb Person of the Week
About Scott Goldstein: MortgageOrb Person of the WeekAbout Scott Goldstein: MortgageOrb Person of the Week
About Scott Goldstein: MortgageOrb Person of the WeekNDeXTech
 
trellix-dlp-buyers-guide.pdf
trellix-dlp-buyers-guide.pdftrellix-dlp-buyers-guide.pdf
trellix-dlp-buyers-guide.pdfLaLaBlaGhvgT
 
A 5 step guide to protecting backup data by Iron Mountain
A 5 step guide to protecting backup data by Iron MountainA 5 step guide to protecting backup data by Iron Mountain
A 5 step guide to protecting backup data by Iron MountainPim Piepers
 
Managing Consumer Data Privacy
Managing Consumer Data PrivacyManaging Consumer Data Privacy
Managing Consumer Data PrivacyGigya
 
DB2 Security Thinking Outside the Box
DB2 Security Thinking Outside the BoxDB2 Security Thinking Outside the Box
DB2 Security Thinking Outside the BoxJerry Harding
 
DB2 Security Thinking Outside the Box
DB2 Security Thinking Outside the BoxDB2 Security Thinking Outside the Box
DB2 Security Thinking Outside the BoxJerry Harding
 
Encryption and Key Management: Ensuring Compliance, Privacy, and Minimizing t...
Encryption and Key Management: Ensuring Compliance, Privacy, and Minimizing t...Encryption and Key Management: Ensuring Compliance, Privacy, and Minimizing t...
Encryption and Key Management: Ensuring Compliance, Privacy, and Minimizing t...IBM Security
 
Data Warehousing & Business Intelligence 5 Years From Now
Data Warehousing & Business Intelligence 5 Years From NowData Warehousing & Business Intelligence 5 Years From Now
Data Warehousing & Business Intelligence 5 Years From NowTeradata Corporation
 
Key note in nyc the next breach target and how oracle can help - nyoug
Key note in nyc   the next breach target and how oracle can help - nyougKey note in nyc   the next breach target and how oracle can help - nyoug
Key note in nyc the next breach target and how oracle can help - nyougUlf Mattsson
 
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)Cisco Service Provider Mobility
 
Big Data for Defense and Security
Big Data for Defense and SecurityBig Data for Defense and Security
Big Data for Defense and SecurityEMC
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataHari Priya
 
Anonos FTC Comment Letter Big Data: A Tool for Inclusion or Exclusion
Anonos  FTC Comment Letter Big Data: A Tool for Inclusion or ExclusionAnonos  FTC Comment Letter Big Data: A Tool for Inclusion or Exclusion
Anonos FTC Comment Letter Big Data: A Tool for Inclusion or ExclusionTed Myerson
 
Who is the next target proactive approaches to data security
Who is the next target   proactive approaches to data securityWho is the next target   proactive approaches to data security
Who is the next target proactive approaches to data securityUlf Mattsson
 
Information Security vs. Data Governance vs. Data Protection: What Is the Rea...
Information Security vs. Data Governance vs. Data Protection: What Is the Rea...Information Security vs. Data Governance vs. Data Protection: What Is the Rea...
Information Security vs. Data Governance vs. Data Protection: What Is the Rea...PECB
 
Proven Practices to Protect Critical Data - DarkReading VTS Deck
Proven Practices to Protect Critical Data - DarkReading VTS DeckProven Practices to Protect Critical Data - DarkReading VTS Deck
Proven Practices to Protect Critical Data - DarkReading VTS DeckNetIQ
 
A1 - Cibersegurança - Raising the Bar for Cybersecurity
A1 - Cibersegurança - Raising the Bar for CybersecurityA1 - Cibersegurança - Raising the Bar for Cybersecurity
A1 - Cibersegurança - Raising the Bar for CybersecuritySpark Security
 
Opteamix_whitepaper_Data Masking Strategy.pdf
Opteamix_whitepaper_Data Masking Strategy.pdfOpteamix_whitepaper_Data Masking Strategy.pdf
Opteamix_whitepaper_Data Masking Strategy.pdfOpteamix LLC
 

Similar to Next Generation Data Protection (20)

About Scott Goldstein: MortgageOrb Person of the Week
About Scott Goldstein: MortgageOrb Person of the WeekAbout Scott Goldstein: MortgageOrb Person of the Week
About Scott Goldstein: MortgageOrb Person of the Week
 
trellix-dlp-buyers-guide.pdf
trellix-dlp-buyers-guide.pdftrellix-dlp-buyers-guide.pdf
trellix-dlp-buyers-guide.pdf
 
A 5 step guide to protecting backup data by Iron Mountain
A 5 step guide to protecting backup data by Iron MountainA 5 step guide to protecting backup data by Iron Mountain
A 5 step guide to protecting backup data by Iron Mountain
 
Managing Consumer Data Privacy
Managing Consumer Data PrivacyManaging Consumer Data Privacy
Managing Consumer Data Privacy
 
DB2 Security Thinking Outside the Box
DB2 Security Thinking Outside the BoxDB2 Security Thinking Outside the Box
DB2 Security Thinking Outside the Box
 
DB2 Security Thinking Outside the Box
DB2 Security Thinking Outside the BoxDB2 Security Thinking Outside the Box
DB2 Security Thinking Outside the Box
 
Encryption and Key Management: Ensuring Compliance, Privacy, and Minimizing t...
Encryption and Key Management: Ensuring Compliance, Privacy, and Minimizing t...Encryption and Key Management: Ensuring Compliance, Privacy, and Minimizing t...
Encryption and Key Management: Ensuring Compliance, Privacy, and Minimizing t...
 
Data Warehousing & Business Intelligence 5 Years From Now
Data Warehousing & Business Intelligence 5 Years From NowData Warehousing & Business Intelligence 5 Years From Now
Data Warehousing & Business Intelligence 5 Years From Now
 
Key note in nyc the next breach target and how oracle can help - nyoug
Key note in nyc   the next breach target and how oracle can help - nyougKey note in nyc   the next breach target and how oracle can help - nyoug
Key note in nyc the next breach target and how oracle can help - nyoug
 
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
 
rutgers slides04
rutgers slides04rutgers slides04
rutgers slides04
 
Big Data for Defense and Security
Big Data for Defense and SecurityBig Data for Defense and Security
Big Data for Defense and Security
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Anonos FTC Comment Letter Big Data: A Tool for Inclusion or Exclusion
Anonos  FTC Comment Letter Big Data: A Tool for Inclusion or ExclusionAnonos  FTC Comment Letter Big Data: A Tool for Inclusion or Exclusion
Anonos FTC Comment Letter Big Data: A Tool for Inclusion or Exclusion
 
Who is the next target proactive approaches to data security
Who is the next target   proactive approaches to data securityWho is the next target   proactive approaches to data security
Who is the next target proactive approaches to data security
 
Information Security vs. Data Governance vs. Data Protection: What Is the Rea...
Information Security vs. Data Governance vs. Data Protection: What Is the Rea...Information Security vs. Data Governance vs. Data Protection: What Is the Rea...
Information Security vs. Data Governance vs. Data Protection: What Is the Rea...
 
Fundamentals of Big Data
Fundamentals of Big DataFundamentals of Big Data
Fundamentals of Big Data
 
Proven Practices to Protect Critical Data - DarkReading VTS Deck
Proven Practices to Protect Critical Data - DarkReading VTS DeckProven Practices to Protect Critical Data - DarkReading VTS Deck
Proven Practices to Protect Critical Data - DarkReading VTS Deck
 
A1 - Cibersegurança - Raising the Bar for Cybersecurity
A1 - Cibersegurança - Raising the Bar for CybersecurityA1 - Cibersegurança - Raising the Bar for Cybersecurity
A1 - Cibersegurança - Raising the Bar for Cybersecurity
 
Opteamix_whitepaper_Data Masking Strategy.pdf
Opteamix_whitepaper_Data Masking Strategy.pdfOpteamix_whitepaper_Data Masking Strategy.pdf
Opteamix_whitepaper_Data Masking Strategy.pdf
 

Next Generation Data Protection

  • 1. Recover, Restore, Resume: Protecting the Company Jewels A Next Generation Data Protection Industry Primer KEY CONTACTS ° Rick Dalton Managing Director rdalton@svballiant.com 650.330.3799 ° Melody Jones Vice President mjones@svballiant.com 650.330.3076 IN THIS ISSUE ° A Brief History of Data Protection ° Next Generation Data Protection (NGDP) Technologies ° NGDP Direction Over the Next Three to Five Years ° Select Storage Industry Transactions and Comparables January 2007
  • 2. Prognosticating the Next Generation of Data Protection TABLE OF CONTENTS 1 Executive Summary 2 Introduction 3 A Very Brief History of Data Protection 4 Fertile Ground for Innovation Figure 1: NGDP Technology Grid 5 Next Generation Data Protection (NGDP) Technologies and Markets Table 1: Select Companies in the NGDP Market 6 Virtual Tape Libraries (VTLs) Figure 2: Open Server VTL Figure 3: Mainframe VTL Table 2: Select VTL Products on the Market 11 Continuous Data Protection (CDP) Figure 4: CDP Table 3: Select CDP Products on the Market 16 Continuous Snapshot (a.k.a. Small Aperature Snapshot or SAS) Near CDP or CDP-like Figure 5: Continuous Snapshot, SAS Table 4: Select SAS Products on the Market 18 Distributed Remote Office Branch Office (ROBO) Backup to Disk Figure 6: Distributed ROBO Backup to Disk Table 5: Select Distributed ROBO Backup-to-disk Products on the Market 21 De-duplication—a.k.a. Single Instance Storage (SIS) and Global Single Instance Storage (GSIS) Figure 7: De-duplication—SIS/GSIS Table 6: Select De-duplication Products on the Market 23 Security and Encryption – In-flight and At-rest Figure 8: Security and Encryption Figure 9: Record Retention Periods Mandated by Various Regulations in the United States 27 Market Research Market Requirements Figure 10: Data Protection Survey U.S. SME What the Numbers Mean Figure 11: Data Protection Survey U.S. SMB 29 NGDP Direction over the Next Three to Five Years 31 Conclusions 32 Table 7: Select Storage Industry M&A Transactions 33 Table 8: Select Storage Industry Company Comparables 34 Table 9: Interview Tables and Table 10: Acronyms and Abbreviations 35 Select SVB Alliant Transactions 36 About SVB Alliant 36 References
  • 3. Prognosticating the Next Generation of Data Protection Executive Summary It is often said that the only constant in technology is change. Never has this been more true than with the current state of the data protection market. In the past, data protection had the justifiably earned reputation of being a fairly stable and static market. It was primarily known as simply backup- to-tape or storage array-based replication. Backup- to-tape market changes were typically evolutionary and incremental. Often the changes involved higher- density and faster tapes, speedier tape drives, better robots with greater-density tape libraries, or more operating systems and databases covered by the backup software. Changes in storage array replication were also evolutionary, usually involving different physical interfaces (such as gigabit Ethernet, faster fibre channel, fiber connectivity [FICON]), more snapshots, higher capacities, or incremental snapshots. The 21st century era of regulatory and legal compliance, the swelling threats to internal and external security, and the exponential increase in electronic data discovery (e-discovery) around litigation have changed the data protection market forever. Data protection must do more now than ever before. Protected data must be preserved (often for lengthy periods), classified, searched, migrated by policy or age to lower-cost storage, compressed, de-duplicated, encrypted, restored on command, and eventually provably destroyed. The first of next generation data protection (NGDP) products have already exploded onto the market. Theseincludevirtualtapelibraries(VTLs),continuous data protection (CDP), continuous snapshot (a.k.a. small aperture snapshot [SAS]), distributed backup to disk from remote and branch offices, de-duplication, and security including encryption. This paper examines the new technologies of the NGDP market, going past the trade press hype and digging into their real value propositions. It then compares most of the players in each of the technology market niches. Finally, it predicts how the NGDP market will rapidly evolve over the next three to five years.
  • 4. Prognosticating the Next Generation of Data Protection Introduction The staid storage software market no longer deserves the reputation of being a weary bore. It has been experiencing rapid and radical change. It is no longer good enough to simply store data and back it up in case of a disaster. Today’s data is stored, classified, aged, moved to the appropriate storage tier based on application and data value (determined by organizational usage as well as regulations or laws), moved again (based on aged value), encrypted, sorted, archived, and even digitally shredded. Known as information life cycle management (ILM), it is rapidly being implemented from medium to large IT organizations as part of their tiered storage strategy. ILM is the tip of the iceberg. The unprecedented expansion of stored data, doubling every 12 to 24 months on average, is causing the data protection market to undergo far-reaching changes as evidenced by the explosive proliferation of new technologies, products, startups, and perceived new markets. It’s not just the changes and the growth in the primary data storage market that are driving the changes in the data protection market. Identity theft, regulatory compliance, e-discovery, smarter cyber-crooks, plus an exponential increase in virus attacks, worms, and bots are also driving it. It seems as if every week a new report of lost, pilfered, purchased, or stolen data appears in the news. Well-known brand names are being embarrassed by human error and outright theft including such easily recognizable names as ABN Amro, Bank of America, Citibank, Fidelity Investments, Hotels. com, Marriott, the U.S. Veterans Administration, and the list goes on. Then there is the increasing mission-critical reliance on e-mail and instant messaging (IM). E-mail/IM/ BlackBerry servers such as Microsoft Exchange can and often do crash, leaving organizations scrambling for hours, days, weeks, and even months to recover their data. Legislative bodies, regulatory agencies, and numerous worldwide consumer groups are taking more than a little interest. There is now extensive, albeit sometimes vague, legislation and regulations about data protection, retention, recoverability, and even encryption specifying liability costs for failure to ensure data privacy in the following areas: ° Finance • Federal Reserve Board regulations • Securities and Exchange Commission (SEC) Rule 240 17(a)-4 • NASD Rule 2211 ° Health care • Health Insurance Portability and Accountability Act (HIPAA) ° Pharmaceuticals • 21 Code of Federal Regulations (CFR) Part 11 • European Union (EU) Annex 11 ° Commerce risk management • Gramm-Leach-Bliley Act (GLBA)1 ° Corporate governance • Sarbanes-Oxley Act ° Risk management • Basel II Capital Accord (international) ° Money laundering directives from the United States and the European Union • USA PATRIOT Act • EU Data Protection Directive (EU DPD)2
  • 5. Prognosticating the Next Generation of Data Protection Not to mention there are more than two dozen states in the U.S. that have laws on the books based on California Assembly Bill No. 1950, that spell out in detail immediate notification requirements when a consumer’s data is lost or stolen if it is unencrypted, the highlight of this legislation being the term unencrypted. Then there is the cost of compliance failures. ° The New York Data Law A.4254 states that failure to disclose data breaches can result in fines up to $150,000 per incident. ° In February 2006 U.S. investment bank Morgan Stanley offered to pay $15 million to resolve an investigation by U.S. regulators into the bank’s failure to retain e-mail messages. E-mail took center stage in a $1.58 billion judgment against the company in a case that centered on the bank’s inability to produce e-mail documents. The bank said that backup tapes had been overwritten. ° In Zubulake v. UBS Warburg, a gender- discrimination suit, the judge instructed the jury that it was legitimate to presume that the information UBS Warburg couldn’t provide due to lost backup tapes and e-mails was probably damaging to the company’s case. Zubulake was awarded $20 million. Many vendors cite this precedent as an example and a reason for customers to purchase their NGDP solutions. This is only the beginning. More laws and regulations are on the way. To understand a bit more about where data protection is going requires some knowledge of where it has been. In the beginning data protection was driven by the need to recover an organization’s data in the event of a disaster. Disasters were originally narrowly defined as the natural type (floods, hurricanes, typhoons, tsunamis,earthquakes,volcanicexplosions,tornados, fires, and the like) or the man-made type (fires, explosions, terrorist attacks, and human error). As horrific as these events can be and often are, they are relatively rare. Most organizations did not, and still do not, want to use high-priced primary (a.k.a. premium) disk storage for rarely used data. This scenario led to the development of lower- cost data storage. The first mass-market lower- cost storage for disaster recovery was tape. Tape was, and to a large extent today still is, much less expensive per gigabyte than primary disk storage on a raw and usable basis. Even data management and infrastructure were less expensive. As data processing evolved over time and became more distributed and user driven, so too did data protection. Disaster recovery definitions expanded to include those caused by humans, including data loss from malware, errors, deletions, disgruntled employees, and criminals both inside and outside of the organization. As end-user work became more tied to desktop and laptop personal computers (PCs), new requirements began entering the data protection picture. Data protection that works for the IT organization as a whole does not necessarily meet the needs of the individual user. Disaster recovery was proving to be far too narrow a definition for data protection. Users wanted a lot more—such as the ability to recover individual lost, deleted, or damaged files. Additionally, they did not want the hassle or time waste of restoring entire volumes to recover just a single file. A Brief History of Data Protection
  • 6. Prognosticating the Next Generation of Data Protection Organizational needs, typically data center focused, were also evolving. Global trade and the Internet changed the concept of operational hours, making it easier to accomplish nondisruptive backups. The amount of data that could be lost (recovery point objective, or RPO) plus the amount of time required to be up and operational (recovery time objective, or RTO) was rapidly decreasing. Technologies such as storage-based snapshots were developed to help address these issues. And as organizational RPOs and RTOs continued to decrease, snapshots evolved, adding incremental snaps of data plus significantly more of them, further driving increased storage requirements. Backup and recovery are generally measured by two concepts: recovery time objective (RTO) and recovery point objective (RPO). The RTO defines the maximum amount of downtime a company is willing to endure while dealing with a recovery event. The RPO defines the maximum amount of data loss the company can endure in the same situation. For example, consider a company that has defined a two-hour RPO and a one-hour RTO. In the face of a data corruption event at noon, the company expects the systems to be back online and functional no later than 1:00 p.m., and the data that is used for the restoration is no older than from 10:00 a.m. The vast difference in user (distributed) and organizational (data center) requirements caused a data protection split. No longer could one product meet all the demands of the entire organization. Products that met the data center requirements seldom met the needs of the individual and distributed users. Products that met performance requirements did not always meet storage cost requirements. Products that met user requirements rarely met data center requirements. It is this volatile combination of current laws, regulations, litigious e-discovery, and historical evolution that has provided the fertile ground for clever, innovative NGDP products. The first wave of NGDP technologies and products has already appeared on the market with great fanfare and hype. These technologies include VTL, CDP, continuous snapshot, distributed backup to disk from remote office branch office (ROBO), de-duplication (both locally and globally), officially designated single instance storage (SIS) and global single instance storage (GSIS), and data encryption (in-flight and at-rest). Fertile Ground for Innovation VIRTUAL TAPE MIRRORING SNAPSHOT DATA PROTECTION LEVEL DATARECOVERABILITY CDP DISTRIBUTED ROBO BACKUP TO DISK REPLICATION ° ° ° ° ° ° BACKUP ° Figure 1: NGDP Technology Grid Source: SVB Alliant
  • 7. Prognosticating the Next Generation of Data Protection ° EMC ° FalconStor ° HP ° IBM ° Network Appliance ° Quantum ° Sun/StorageTek ° COPAN Systems ° Data Domain ° Diligent ° Fujitsu/Siemens ° Neartek ° Sepaton ° Spectra Logic ° CA/XOsoft ° CommVault ° EMC/Kashya ° FalconStor ° IBM/Tivoli ° Iron Mountain ° Symantec/Revivio ° Asempra ° Asigra ° Atempo/Storactive ° FilesX ° InMage ° Mendocino ° SonicWALL ° TimeSpring ° Network Appliance ° Cloverleaf ° Exanet ° EMC/Avamar ° FalconStor (VTL) ° Quantum/ADIC/ Rocksoft (imminent) ° Symantec/Data Center Technologies (ROBO) ° Asigra (ROBO) ° Data Domain ° Diligent ° ExaGrid ° Sepaton ° EMC/Avamar ° Iron Mountain ° Symantec/VERITAS ° Asigra ° EVault ° Signiant NGDP Market VTL CDP Continuous Snapshot a.k.a. SAS Distributed ROBO Backup De-duplication NGDP Market VTL CDP Continuous Snapshot a.k.a. SAS Distributed ROBO Backup De-duplication PUBLICCOMPANIESPRIVATECOMPANIES Next Generation Data Protection Technologies and Markets Table 1: Select Companies in the NGDP Market Source: SVB Alliant
  • 8. Prognosticating the Next Generation of Data Protection virtual tape libraries (vtls) Virtual tape is software that provides the image of a logical tape drive on magnetic hard disk drives (HDDs). The software emulates the actual physical tape; however, all the data is in a tape-streaming format existing virtually on HDD (see Figure 2). This stages tape data on disk at a much faster rate than directly to tape. VTL value proposition claims: 1. VTLs utilize much less real tape as the data moves from virtual tape to physical tape. Many backup operations often do not fill the physical tape media when applications write to the tape. VTLs can reduce the amount of tape media in a range of 25 to 98 percent. 2. VTLs provide up to five times faster backups than backup to native tape drives. 3. VTLs have been proven to increase backup reliability by more than 25 percent over native tape drives or libraries. 4. VTLs are much easier to share on a storage area network (SAN) than native tape drives or libraries. The largest value proposition comes from the mainframe market space. VTLs originated in the mainframe space and have become a must have in that market. The mainframe market uses tape differently than the open server market (Linux, Windows, and Unix). Tape is an important form of nearline storage as well as for backup and archiving. Mainframe tape utilization is typically incredibly low (on average less than 10 percent) because of the heavy use as nearline storage. Some tapes have only a single data set on them. VTLs radically change VTL TAPE LIBRARY LINUX UNIX WINDOWS Figure 2: Open Server VTL Source: SVB Alliant
  • 9. Prognosticating the Next Generation of Data Protection zSERIES MAINFRAME VTL TAPE LIBRARY that equation and increase tape utilization to greater than 90 percent using virtual tape stacking—placing multiple virtual tapes on a single physical tape cartridge (see Figure 3). The value proposition for mainframes based on increased tape utilization (nine times) alone is incredibly high. Providing faster backups is another value; however, it is a distant secondary value proposition for mainframes on par with improved tape data reliability. The open server market is a much different story. This market uses tape only for backup and archival data storage. Typical open server tape utilization is usually greater than 60 percent. Using virtual tape stacking to increase tape utilization to more than 90 percent means a value proposition of only 50 percent improvement (versus the 900 percent improvement) for mainframes. The primary open server VTL value proposition has, until recently, been derived from increased backup and recovery speeds of disks versus tapes, improved tape data reliability, and overall lower total cost of ownership than standard tape drives and tape libraries. And VTLs are not the only disk-to-disk (D2D) solution being deployed. Many IT organizations use CommVault, Legato, NetBackup, and others that back up natively to disk and/or compressed disk, eliminating tape entirely. Alternatively, they can back up to a disk, keep it online for 90 days, then back up to a high- density fully populated tape, ensuring rapid data access or RTO. More recently, open server VTL products have added de-duplication. VTL de-duplication meaningfully reduces the amount of data stored on disk or tape, ranging from 50 to 95 percent. This reduction can quantifiably and meaningfully decrease the cost of storage, potentially making open server VTLs a no-brainer solution and a viable acquisition strategy for companies serving mainframe customers. Figure 3: Mainframe VTL Source: SVB Alliant
  • 10. Prognosticating the Next Generation of Data Protection De-duplication is a hot new area in storage, not limited to VTL. Some companies are moving quickly into this market (Diligent and ExaGrid) to compete with the market initiator Data Domain. Even industry giant Network Appliance is planning to release a product in the near future. This topic is discussed further later in this document. There are two different methodologies for implementing de-duplication with a VTL in real time: as the data is being stored and as a background task after it has been stored. Real-time de-duplication is done on the fly as the backup data is being received by the VTL. It de-duplicates based on the files and/or data blocks (fixed-length data blocks with hash marks) and/or blocklets (variable-length data blocks with hash marks) used by the de-duplication database for comparison, then de-duplicates before the data is written to disk. Data Domain, Diligent, ExaGrid, and FalconStor are in this market, and Quantum/ADIC/Rocksoft has announced it is entering imminently. This methodology has the advantage of utilizing disk storage space only for new data. Conversely, it tends to limit its scalability. As the data scales and the de-duplication meta database grows, the speed of both the backups and restores continuously and noticeably slows down. The larger the de-duplication meta database becomes, the longer it takes to de-duplicate before writing to disk and the longer it takes to recover the data because more and more changes must be added back to the original data as it ages. Background de-duplication waits until the backups have been written to disk, then discards the de- duplicated data. The advantage of this methodology is that it scales ad infinitum. The de-duplication database never affects the speed of the backups negatively, and it provides a high degree of confidence that no unique data is deleted accidentally. Because it is de-duplicating after the data has been written to the VTL, it is always removing the older duplicated data versus the newest data. De-duplication can actually utilize the smallest atomic unit (byte) or any of the other larger data types. Additionally, most users and administrators typically recover the latest version of their data or at least the latest clean version of their data. Background de-duplication means that the data they recover will require the least amount of manipulation, leading to incredibly fast recoveries. The downside to this methodology is that it requires a very sophisticated content awareness of the data to be effective. In other words, the de-duplication engine must be able to see the files, extensions, and headers, after the data is written to the VTL. Background de-duplication also requires more hard disk storage to be utilized as the cache for the de-duplication. The size of the hard disk cache continues to grow as the amount of data being backed up increases. Currently, only Sepaton has developed VTL-based background de-duplication and has a number of patents pending. It is important to realize that de-duplication is not limited to VTLs. De-duplication provides equivalent or greater value for all D2D NGDP. There is also a catch with VTL systems. The baseline underlying assumption to the value of open server VTL systems is that data protection software backs up only to tape or what appears to be tape, hence the term virtual tape. Through 2004 and much of 2005, major backup software products, such as those from Computer Associates (CA), CommVault, EMC, and Symantec, were limited to backing up to tape and could not natively back up to disk. If this were to continue to be true, VTLs would have a bright future with open servers.
  • 11. Prognosticating the Next Generation of Data Protection Unfortunately for the VTL suppliers, backup software has changed. The vast majority of backup software today, as mentioned, can natively back up directly to disk or disk subsystems. And just as VTL vendors are adding de-duplication to their products, so are the backup software vendors. Many can migrate data from disk to tape or optical disc without the additional software overhead of a VTL. As a general rule, backup users contract for subscription services. Subscriptions automatically provide the software upgrades that enhance the backup software, such as backup to disk and/or de-duplication. The net effect is that as backup software has evolved and continues to evolve, it eliminates much of the open server requirement for VTLs, which in turn appreciably reduces the potential VTL addressable market. VTLs for open servers is a textbook example of a “Band-Aid” or short-lived product that solves a functional shortcoming of another technology product—backup software in this case. Once the shortcoming is fixed, we believe the market disappears or greatly diminishes. VTLs for mainframes is an altogether different story. In this market the value is still quite high. There are no inexpensive serial advanced technology attachment (SATA) or SATA II disks in mainframe direct access storage devices. Backup media will continue to be tape for the foreseeable future. And tape is still used as nearline storage. Mainframe VTLs will continue to thrive. IT organizations that have both mainframes and open servers are able to leverage the mainframe VTL investment for both environments. Those VTL vendors that have products for both environments (Diligent, EMC, Fujitsu/Siemens [CentricStor], IBM, and Sepaton expected) should continue to have success. Another downside to VTLs is that each product has a proprietary user lock-in. There is no standard data format or algorithm for VTL systems. Products from different vendors are incompatible with each other and often with other models from the same vendor. Many VTL products use the FalconStor VTL software (COPAN Systems, EMC, IBM, and Sun/StorageTek). Copying and/or moving the data from a VTL to other media for long-term archiving or offsite data storage is another major issue. Data can be moved or replicated from a VTL in three different ways: 1. VTL manages movement of data between disk and tape. 2. Backup software moves data from VTL to tape. 3. VTL systems replicate data to an offsite VTL. Data moved from the VTL to tape is typically not reconstructed in the native backup software format. This means that the backup software cannot recover the data natively without first going through the VTL. When migrating the data from the VTL to tape, inconsistency can be introduced into the backup software catalog if it is not integrated with the VTL. This is especially true when the catalog is not informed that the data has been migrated from the VTL to tape. There are numerous vendors with VTL products in the market today (COPAN Systems, Data Domain,Diligent,EMC/Neartek,FalconStor,Fujitsu/ Siemens, HP, IBM, Network Appliance, Quantum, Sepaton, Spectra Logic, and Sun/StorageTek). Table 2 provides a brief comparison of VTL products that are currently available.
  • 12. 10 Prognosticating the Next Generation of Data Protection Table2:SelectVTLProductsontheMarket Source:SVBAlliant VTL Vendor COPAN Systems DataDomain Diligent EMC FalconStor Fujitsu/ Siemens HP IBM Neartek Network Appliance Quantum Sepaton SpectraLogic Sun/ StorageTek Models 220T 220TX DD460w/VTL ProtecTIERVT VTFOpen VTFMainframe CDL210 CDL710 CDL720 CDL740 VTLStdEdition VTLEntEdition CentricStorVTA VLS6000 TS7510 VTSB10 VTSB20 VSEAppliance NearStoreVTL600 NearStoreVTL1200 PathlightVX450 PathlightVX650 DX30 DX100 S2100-ES2 Enterprise S2100-ES2Rack T950 VSM VTL1000 VTL2000 VTL3000 Basic Usable Capacity TBs 28-448 28-448 6 1,000 1,000 1,000 24 82.5 82.5 165 12 2,000 1.7-176 3-71.4 48 .43-.86 1.73-3.46 1,000 upto84 upto168 4.2 3.8-71.5 5.2-20.8 5.2-83.2 4.8-1,000 4.8-33.6 upto738 .2-7 3.8-139.5 De-duplication Inbound(I) Background(B) (I)or(B) (I)2:1HW Compression (I) (I) (I)2:1 Compression (I)3:1 Compression (I)2:1 Compression (I)3:1 Compression Q406 (I)2:1 Compression (I)3:1 Compression (I)2:1 Compression (I)2:1 Compression Roadmap (I)2:1Hardware Compression DeltaStor(B) No No Q406(1) Compressed Usable Capacity TBs 56-896 56-896 80-240 25,000 2,000 2,000 72 247.5 247.5 495 24 4,000 5.1-528 75-1,785 96 1.3-2.6 5.2-10.4 2,000 168 376 4.2 3.8-71.5 10.4-41.6 104-166.4 120-25,000 120-840 upto894 .8-28 3.8-139.5 Virtual Libraries Range 1-56 5-56 1-47 16 16 32 16 64 128 128 16 64 64 1-16 128 1 1 Upto255 virtual libraries/FC port 256 512 1-6 1-32 1-30 1-55 1,535 192 120 1 16 64 128 Virtual Drives Range 1-56 5-56 1-47 128 128 256 64 512 1,024 1,024 128-2,048 512-4,096 32-512 1-64 1,024 64 256 Upto20 virtualtape drives/virtual library 1,500 3,000 1-20 1-40 1-30 1-55 1,535 192 745 256 128 512 1,024 Max Throughput TB/Hr 2.75 5.2 .29 2.9(cluster of4servers) 1.7 1.7 1.37 2.16 2.88 4.32 2.75 2.75 .61-9 2.2 2.75 1.2(est) 1.2(est) 2 2.05 4.1 2 2 1 3.6 8.6 1.08 31.68 TBD 1.08 2.16 3.24 Interfaces FC/iSCSI (4)2GFC (8)2GFC (1)2GFC (2-8)2/4GFC (2-8)2/4GFC (2-8)2/4GFICON (3)2GFC (8)2GFC (12)2GFC (12)2GFC (2)4GFC(2) 1GiSCSI (4)4GFC(4) 1GiSCSI (2-56)2GFCor ESCON (2-4)2GFC (8)4GFC (8)4GFICON or(8)ESCON (8)4GFICON or(8)ESCON Upto64FCor ESCON (10)4GFC (20)4GFC (2)2GFC (2)2GFC (2)2GFC (8)2GFC (16)2GFC (2)2GFC (24)4GFC(12) 1GiSCSI (4-16)4GFICON or(16-32)ESCON (6)4GFC (10)4GFC (20)4GFC Export toTape Libraries Native Tape Yes No No No No No Yes Yes Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No No Yes Yes Yes Yes Yes Encryption At Rest Yes Yes No No No No Yes Yes Yes Yes Yes Yes No No Yes No No No Yes(Decru appliance) No No No No No No Yes No Yes Yes Yes Mainframe zOS No No No No No Yes Yes Yes Yes Yes No No Yes No No Yes Yes Yes No No No No No No Q406 Q406 Yes Yes No No No Replication Offsite D2D No No Yes Yes, usesarray replication Yes Yes Yes Yes Yes Yes No Yes Yes No No Yes Yes Yes No No No No Yes Yes No No Yes Yes Yes VTLSoftware FalconStor DataDomain Diligent FalconStor FalconStor Fujitsu/Siemens Sepaton FalconStor IBM IBM Neartek NetworkAppliance NetworkAppliance Pathlight Quantum Sepaton SpectraLogic Sun/ StorageTek FalconStor De-duplication information Comments AvailableQ406 (I) VTLisadd-onto de-dupe HyperFactor Standard Compression AvailableQ406 (I) AvailableQ406 (I) Roadmap Sameas Sepaton(B) AvailableQ406 (I) None Roadmap Roadmap De-duplication fromRocksoft (blockletsI) FileByte based Future None AvailableQ406 (I)
  • 13. Prognosticating the Next Generation of Data Protection 11 continuous data protection (cdp) Based on the trade press, CDP is even hotter than VTLs in the data protection market. CDP is definedby the Storage Network Industry Association (SNIA) to be: “A methodology that continuously captures or tracks data modifications and stores changes independent of the primary data, enabling recovery points from any point in the past. CDP systems may be block-, file-, or application-based and can provide fine granularities of restorable objects to infinitely variable recovery points.” IDC’s definition of CDP is similar: “Continuous data protection, also referred to as continuous backup, pertains to products that track and save data to disk so that information can be recovered from any point in time, even minutes ago. CDP uses technology to continuously capture updates to data in real time or near real time, offering data recovery in a matter of seconds. The objectives of CDP are to minimize exposure to data loss and shorten time to recover.” In general, CDP is a journaling function that keeps track of all application data changes and time-stamps them. It theoretically allows an application or a user to recover data from any point in time. CDP has an RPO of zero—that is, the user can restore data to a single moment before the data corruption or failure, resulting in no data loss. Traditional data protection technologies usually range from six to 24 hours, which can be far too long and too much potential data loss for mission-critical applications. Although the definition of CDP does not directly speak to the RTO, most CDP solutions are capable of very fast data restoration, ranging from a few seconds to several hours depending on the vendor and the product. LINUX UNIX WINDOWS CDP COLLECTOR CDP SOFTWARE OR AGENTS SAN STORAGE NAS STORAGE Figure 4: CDP Source: SVB Alliant
  • 14. 12 Prognosticating the Next Generation of Data Protection This can be heady stuff for the IT manager charged with protecting the data and can be a significant expense. For instance, the infrastructure costs of snapshots or high-function VTLs are similar to the costs of CDP infrastructure. If CDP costs are compared with those of traditional data protection, there can be a large divergence. The features and the promises of CDP are not free. CDP’s value comes from three different aspects. The first is the point solution of creating an infinite RPO while vastly accelerating the RTO. The second is as a platform or core technology that exists because it is tracking data at a very fine-grain level, with every single write operation with guarantees across the entire application data set inherently protecting application coherency and consistency. The third is the protected data replicability from multiple locations to a central site and the ability to restore on different physical machines. 1. CDP Point Solution The RPO and RTO improvements are quite useful for the IT organization’s most important applications. These applications are deemed mission critical because they play such an integral role in operations. For instance, a brokerage house uses a database to transact and recover security trades that drive its business. If this database is rendered incapacitated, the brokerage’s main business is no longer functional, having negative impacts such as loss of revenue, loss of customers, and possibly regulatory exposure. Often the RTO and RPO benefits are interesting to applications that aren’t mission critical, but are nonetheless very important, and are either very large or being changed so fast that traditional backup and restoration technologies are untenable. In both cases the ability to quickly restore any previous version of the application’s data is important. CDP does not have any gaps in the protected data because it captures every file, block, or table change and nuance in the system being protected as it occurs. To best illustrate the gains CDP achieves as a point solution, it is worth using a real-life example of a mission-critical application recovery without and then with a CDP solution in place. Microsoft Exchange is the principal e-mail server for a majority of organizations. It is the primary communications medium for many organizations’ employees, suppliers, vendors, and, most important, customers. This makes it mission critical. Microsoft Exchange is also extraordinarily difficult to restore with most data protection applications. Legacy data protection applications require hours, days, weeks, and occasionally months (when things go really, really awry!) for a full Microsoft Exchange restoration, which is far too long for a mission- critical application. Because restoring Microsoft Exchange is a lengthy and complex endeavor, as well as frustratingly time consuming depending on the number of mailboxes and messages that need restoring, it is an ideal candidate for CDP technology. Examining the Microsoft Exchange restore process makes this abundantly clear: ° First, apply the last full backup. ° Next, apply the transaction logs—if they are available (and often they are not).
  • 15. Prognosticating the Next Generation of Data Protection 13 ° If the expertise is available, the messages and the transactions are restored to each individual mailbox, which is very time consuming. Most administrators will not even make the attempt, leaving a lot of data that is never restored. ° During the restoration process, Microsoft Exchange is down and a temporary server is required. CDP makes Microsoft Exchange recoveries painless and fast. First it rewinds Microsoft Exchange back to the last known consistency point and gets it up and running in seconds or minutes. When Microsoft Exchange is running again, it provides a simple point-and-click restoration of messages and transactions back to the individual mailboxes, fast and easily. A similar recovery experience exists for many enterprise applications such as IBM DB2, IBM Informix, Microsoft SQL Server, Oracle (other than the new 10g Enterprise versions), and Sybase. Older enterprise applications typically have a larger difference in recovery times and processes. 2. CDP Platform As discussed, CDP captures all of the data as it is created. The result is the ability to re-create application data from any previous point in time with inherent application consistency properties. Comparing CDP with accomplishing similar application consistency with traditional approaches, (backup, VTLs, snapshots, and SASs) demonstrates why CDP was developed. Traditional approaches capture a point in time of an application’s data. This can be problematic when the data is spread across multiple storage units or medias. They cannot guarantee that all the pieces are captured at exactly the same time. One part of the application may be captured at 9:00:01 whereas another part is captured at 9:00:14. This scenario invalidates the internal transactional consistency that is embedded in database applications. Those applications (e.g., IBM DB2, IBM Informix, Microsoft Exchange, Microsoft SQL Server, Oracle, Sybase, file systems, and others) usually rely on other mechanisms such as hot backup or online backup to allow the backups to be taken while the application is still operationally active and to produce a recoverable backup set that is application consistent. These external actions force the applications to freeze momentarily while the backup is occurring. This process is complicated and can be error prone, with a negative impact on the performance of the underlying application. In contrast, most CDP implementations maintain application consistency and coherency without application disruption or operational complexity. The CDP stored data can be migrated into other protection domains (e.g., archival, tape, or optical disc) without losing this attribute. Another benefit of CDP is its ability to provide nondestructively any-point-in-time virtual data copies. Some CDP applications automatically test and audit processes based on policy, on those copies of the data, and mark these times as significant or as valid points in time for recovery if primary data is corrupted or infected with malware (viruses, worms, and the like). Other CDP applications allow the administrator to manually write scripts to automate the same processes. Not all CDP applications have this capability.
  • 16. 14 Prognosticating the Next Generation of Data Protection The platform concept is called continuous data technology (CDT) to distinguish it from CDP. The CDT value comes from incredibly fast RTO with known application consistency and coherency. This in turn allows fast recovery from application data corruptions, malware attacks, criminal data destruction, and malicious employees. Additionally, as the protected data is migrated from online to nearline to offline, the application data consistency and coherency are preserved for each of those points in time. 3. CDP Protected Data Replicability Data run through a CDT platform lends itself well to replication (offsite disaster recovery) and imaging—using data for alternate purposes like testing, development, archiving, reporting, and auditing. Some vendors are already exploiting these areas, with distributed many-to-one for ROBO (Asigra, CA/XOsoft, CommVault, FalconStor) and one-to-one for data center to disaster recovery (DR) site (EMC/Kashya, Symantec/Revivio). Many of the other vendors have some version of it on their two- to-three year roadmap. There are numerous variations on how CDP can be deployed. It can be part of the application itself, or it can be a stand-alone product. Oracle 10g Enterprise Edition has a feature called Flashback that is an example of CDP technology integrated into the application—the only current example of such a deployment. As a stand-alone product, CDP can be implemented as host-based (software that must be installed on the application host server), network-based (the software runs on hardware that is connected to the storage or Internet Protocol network, independent of the application host and the storage subsystems), or it can be a part of the storage subsystem. When implemented as a host-based solution, CDP can be implemented via either operating system device drivers (as system agents) or as a part of the application. This is the approach of Asempra, Atempo, CA/XOsoft, CommVault, EMC/Kashya, FalconStor, FilesX (when not utilizing Cisco’s intelligent switching module SSM), Iron Mountain, SonicWALL, and TimeSpring. Network implementations are the most flexible and involve installing software on a high-functioning storage network switch (Mendocino), using an appliance to deploy into the storage network (EMC/ Kashya when utilizing Cisco’s intelligent switching module SSM, FalconStor, Symantec/Revivio, and SonicWALL), or using dedicated servers on the Transmission Control Protocol/Internet Protocol (TCP/IP) network (Asigra). The storage network based solutions require a higher infrastructure cost, particularly to companies that do not already have these networks in place. Additionally, some of these solutions (Asigra and Symantec/Revivio) can accommodate multiple customer applications that are related and must be protected and restored and synchronized with one another. (Note: A few of the host-based vendors have promised network versions of their solutions in their roadmaps.) Embedded storage subsystems do not exist, but several vendors are working on this variation.
  • 17. Prognosticating the Next Generation of Data Protection 15 In summary CDP makes a lot of sense for those applications and data that require the absolute minimum amount of data loss with the fastest possible recovery. CDP is the best possible data protection with the lowest RPO and RTO while providing application data consistency and coherency. It is the highest level of insurance for data currently available on the market. And just like insurance, there is a cost to CDP. CDP typically utilizes 0.4 to 2.5 times the amount of disk that the primary data requires, depending on how far back in time the user wants to be able to restore. Plus, there are the software, maintenance, and subscription licensing fees. This leads to the conclusion that CDP has a definitive role in data protection for the organization’s mission-critical data that cannot be lost under any circumstances. Table 3 provides a brief noncomprehensive look at those CDP products that are currently available. Table 3: Select CDP Products on the Market Source: SVB Alliant CDP Vendor ° Product Asempra ° Business Continuity Server Asigra ° Televaulting 6.2 Atempo/ Storactive ° LiveBackup CA/XOsoft ° Enterprise Rewinder CommVault ° QiNetix 6.1 EMC/Kashya ° RecoverPoint FalconStor ° CDP FilesX ° XpressRestore 3.0 InMage ° DR-Scout VX IBM/Tivoli ° CDP for files Iron Mountain ° LiveVault CDP Mendocino ° Recovery One ° HP CIC Symantec/Revivio ° CPS SonicWALL ° CDP Symantec/ VERITAS ° Backup Exec 10d Windows Server TimeSpring ° Time Data Agentless Server(S) Switch(Sw) S Yes S S S S or Sw S S S S S Sw Sw Yes S S S De- duplication Local(L) Global(G) No L/G No No No In-flight only L No No Std. Compress No No No No Std. Compress No No Mail Support Exchange(E) Notes(N) GroupWise(G) E E E E E E E E E No E E E E E No E Database Support Oracle(O) SQL(S) DB2(D) S Future S O/S S O/S O/S/D S S No O/S O/S/D O/S/D O/S/D S No S Versioning Multi- generations Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Design Method Block(B) File(F) B B App F F B B F B F F B B B F F F CDP Capture Write(W) Time(T) W W W W W W W T W W W T/W T/W T W W W Rollback Basis Time(T) Event(E) T/E T/E T/E T/E T/E T/E T/E T/E T/E T T T/E T/E T T T T Windows Support 2000/ 2003 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Linux Support Red Hat(RH) Suse(S) No RH/S No No No RH/S RH/S No No No RH/S RH/S RH/S No No No No UNIX Support AIX(A) Solaris(S) HPUX(H) Mac OSX(M) No A/S/H/M No No No A/S/H A/S/H No No No S A/S/H A/S/H S No No No System Protection System Files Windows All Windows No No All All Windows Windows No No No No All Windows No No Scale Local(L) Ent.(E) L E L E E E E L E L L E E E L L L Hardware Std(S) Intelligent Switch(I) Unique(U) S S S S S I/S/U S S S S S S/I S/I U U S S Network SAN(S) TCP/IP(I) I I I I I S S I I I I S S S I I I Encrypt At-rest (AR) In-flight (IF) No AR/IF No No No No AR/IF No No AR/IF No No No No AR No No ROBO Support WAN Optimized No Yes No No No Yes Yes No No No No No No No No No No Replication Offsite Repository No Yes No Yes Yes Yes Yes Yes Yes Yes Yes Future Future Yes No No Yes
  • 18. 16 Prognosticating the Next Generation of Data Protection continuous snapshot (a.k.a. small aperture snapshot or sas) near cdp or cdp-like Continuous snapshot, or small aperture snapshot (SAS), is often confused with CDP. In truth, it is similar but is not the same. Continuous snapshot provides coarse-grain RPO and an RTO similar to the outside bounds of CDP (minutes to hours). There are two primary differences between CDP and SAS. First, snapshots are not continuous. There is a time gap between snapshots, whereas CDP has no time gap. This time gap ranges from minutes to days, and this gap is the period of time in which data can be lost between snapshots. Second, because the SAS data capture has gaps between snapshot captures, most products (Cloverleaf, Exanet, Microsoft, Network Appliance, and Symantec/DCT) do not have the same application consistency attributes as CDP. This means they have to make the application operations pause for the snapshot data to be in a recoverable form. Small aperture snapshots are, as a rule, incremental over previous snapshots. This means they capture only the changes between snapshots, providing similar disk consumption attributes as CDP solutions. LINUX UNIX WINDOWS NO AGENTS SNAP 1 ° SNAP 2 ° SNAP 3 ° SNAP 4 ° SNAP 5 ° SNAP 6 ° SNAP, ETC. SUBSEQUENT SMALL APERTURE SNAPS NAS STORAGE Figure 5: Continuous Snapshot or SAS Source: SVB Alliant
  • 19. Prognosticating the Next Generation of Data Protection 17 Continuous Snapshot/SAS Snapshot granularity Max. number of snapshots Max. file system volume size Max. block volume size Asynchronous remote replication Consistency groups Fast rollback restores to any snapshot Exanet Exastore User defined (secs.) No limits 1 Exabyte NA Yes No Yes Network Appliance V-Series/FAS/GX ONTAP® Snapshot™ User defined (mins.) 255 per virtual volume (up to 171,000 volumes) 16 TB 2 TB Yes Yes Yes Cloverleaf iSN User defined (mins.) 200,000 per iSN 16 TB 64 TB Yes Yes Yes VENDOR PRODUCT When snapshots are full volume and not incremental, the amount of storage required grows large amazingly fast. Each full-volume snapshot is the equivalent size as the primary data. If the primary data takes up as little as 1 terabyte (TB) of storage, continuous snapshots with an RPO of 15 minutes would require 96 TB of storage for just a single day of snapshots. This is not a financially practical solution for most organizations, especially when compared with the incremental snapshot and CDP approaches, requiring a range of 0.4 to 2.5 times the amount of disk space that the primary data requires. One major advantage of continuous snapshot over CDP is that it does not require any agents. And, correspondingly, it does not have the costs associated with agents. Therearealimitednumberofvendorswithcontinuous snapshot products on the market today (Cloverleaf, Exanet, Network Appliance). Table 4 provides a brief comparison of those continuous snapshot products currently available. Table 4: Select SAS Products on the Market Source: SVB Alliant
  • 20. 18 Prognosticating the Next Generation of Data Protection distributed remote office branch office (robo) backup to disk Distributed ROBO backup to disk is the only current data protection technology designed from the ground up for a distributed environment. This means that the data protection software recognizes ROBOs that are connected to the central data center or DR site over limited bandwidth wide area networks (WANs); each ROBO has limited or no data protection administration skills onsite, and users require local recovery of their data without an administrator. ROBOs must protect a wide variety of data, from operating systems in servers to desktops and laptops that range in size from a single user to hundreds of users. Distributed ROBO backup to disk has been around for years. Most organizations were not aware of its existence because it had been primarily a service provider technology. Just recently (over the past few years) has it been licensable by end users. Because of this, distributed ROBO backup to disk must be more efficient than traditional backup or replication technologies. Additionally, it must be transparent to users with intuitive data recovery, provide multiple data generations (SAS), or provide the RPO/RTO granularity of CDP. CDP has in fact been added as a feature to several distributed ROBO backup-to-disk products (Asigra Televaulting, CommVault QiNetix, and Iron Mountain/LiveVault). ROBO CENTRAL DATA CENTER ROMOTE COLLECTOR ROBO ROMOTE COLLECTOR ROBO ROMOTE COLLECTOR TCP/IP WAN LAPTOPS NASCENTRAL BACKUP SYSTEM Figure 6: Distributed ROBO Backup to Disk Source: SVB Alliant
  • 21. Prognosticating the Next Generation of Data Protection 19 To provide the WAN efficiency so vital to the distributed ROBO backup-to-disk value proposition, it should include de-duplication (locally and globally), WAN transmission of just the changes or delta data, and compression of the remaining data as it is transmitted across the WAN. ROBO de-duplication eliminates transmission and storage of duplicate files or data blocks. Several distributed ROBO backup-to-disk vendors (Asigra, EMC/Avamar, Symantec/DCT) have their own de-duplication methodologies. The data block methodology tends to be much more efficient than the file methodology because it organizes the protected data bytes into data blocks of a specific standard length. It is these arbitrary data blocks that de-duplication software is tracking. Any block that repeats is neither transmitted nor stored. The file methodology works with entire files, which are much bigger than the arbitrary data blocks. If the file or any part of the file changes, the entire file is transmitted and stored. The typical range of de-duplication data reduction is between 60 percent and 90 percent. Eliminating this much data prior to transmission radically reduces bandwidth requirements for ROBO data protection. It also radically reduces the amount of storage required for protected data. The local de-duplication takes place at the ROBO location. The global de-duplication takes place at the central site data center or DR center. Global de-duplication removes the duplicates among all of the ROBO sites. The next WAN efficiency gain comes from the transmission of just the ROBO delta changes. Delta change transmission means that only the files or delta blocks that changed are sent across the WAN. This has been a common feature for server replication and traditional backup for years and has a more profound positive effect on de-duplicated, distributed ROBO to disk. De-duplication requires a meta data database to keep track of all of the data blocks and/or files that pass through the distributed ROBO backup- to-disk software. It then removes the duplicates and places a marker or stub in its place. When the data block or file is recalled or recovered, that marker lets the database know where to pull the stored data. The meta data database is the engine that allows de- duplication to take place. Unfortunately, it can also limit the scalability of each image of the software. The larger the database becomes—and it will continue to get larger as more and more data is protected—the slower it becomes. As the meta data database slows, so do the backups and, more notable, the recoveries. This inherent size limitation can mean that multiple iterations of the software will be running in a large enterprise environment. Delta changes to a large extent relieve a big part of the effect of size limitations. By reducing the backups to delta changes, the de-duplication meta data database has less data to sort and track on an ongoing basis. Less data means ultimately greater performance, scalability, and efficiency. The third phase of the distributed ROBO backup-to- disk efficiency comes from compression. This part can be confusing for some IT managers, analysts, and the trade press. De-duplication is often represented as supercompression technology. When data is de-duplicated, it does not have to traverse the WAN at all; however, the rest of the data must still traverse the WAN. Compression of that data can further enhance the distributed ROBO backup-to-disk software efficiency. Standard compression removes the nulls in the data, with characteristic results being approximately 2:1.
  • 22. 20 Prognosticating the Next Generation of Data Protection Combining de-duplication (removing 60 to 90 percent of the protected data that must traverse the WAN and be stored on disk) with delta changes (reducing the amount of data that requires de-duplication), and compression (further reducing the amount of data that must traverse the WAN and be stored on disk) provides a net result of an incredibly efficient data protection system for distributed ROBO. Another factor that improves the speed of recoverability (RTO) is that all backups, just like CDP, are native to disk. They do not require tape, tape libraries, or virtual tape; although, the data can be migrated to tape as the protected data ages. Distributed ROBO backup to disk is available today from six vendors in various forms (Asigra, EMC/Avamar, IBM/TSM, Iron Mountain, Seagate/EVault, and Symantec). Table 5 provides a brief non-comprehensive comparison of those distributed ROBO backup-to-disk products that are currently available. Table 5: Select Distributed ROBO Backup-to-disk Products on the Market Source: SVB Alliant Vendor Asigra EMC/Avamar Iron Mountain Seagate/EVault Signiant Symantec InfoStage LiveVault ArcWare AXIOM InSync Continuum Product Televaulting Replicator InControl Desktop RDP Pure Disk Server (S) Desktop (D) Laptop (L) support Windows S/D/L S/D/L S/D/L S/D/L S S Red Hat Linux S/D/L S/D/L S/D/L S/D/L S S Novell Suse Linux S/D/L No No No S S Novell NetWare S/D/L No No S/D/L No No Mac OSX S/D/L No No No S No HP-UX S/D No No S/D S No HP Tru-64 Unix S/D No No No No No SUN Solaris S/D S/D S/D S/D S No IBM AIX S/D S/D No S/D No No EMC VMWare S/D No S/D No No No IBM iSeries OS-400 S No No S No No Mail Server and Database Support Microsoft SQL Server Yes Yes Yes Yes Yes No Microsoft Exchange Server and Outlook 2000/2003 Yes Yes Yes Yes Yes No Oracle 8 and above Yes Yes Yes Yes No No DB2 Yes No No No No No MySQL Yes No Yes No No No PostGresSQL Yes No Yes No No No IBM Lotus Notes/Domino Server Yes No No No No No Novel Groupwise Yes No No No No No Advanced Functionality CDP Yes No Yes Yes No No Autonomic verification and healing (data cleansing) Yes Yes No No No No No host software required (agent) Yes No No No No No Local de-duplication (SIS) Yes Yes No No No Yes Global de-duplication (GSIS) Yes Yes No No No Yes Enterprise scalability Yes No Yes No No No Bare metal restore Yes No No No No No Archival Yes No Yes Yes No No ROBO (local) recovery Yes Yes Yes Yes No Yes WAN Opt (compression,delta changes, other) Yes No No No Yes No Native backup to tape and tape libraries No No No No Yes No File search capabilities Yes Yes Yes Yes No Yes Single pane of glass management (Web portal) Yes Yes Yes Yes Yes No Secure encryption in-flight and at-rest Yes Yes No Yes In-flight Yes Distributed ROBO Backup-to-disk Capabilities
  • 23. Prognosticating the Next Generation of Data Protection 21 de-duplication — a.k.a. single instance storage (sis) and global single instance storage (gsis) As discussed in the previous sections, there are measurable benefits from de-duplication. The methods of de-duplication are by files, data blocks (arbitrary fixed lengths of data with hash marks), blocklets (variable lengths of data with hash marks), and byte. Files are coarse- grain de-duplication. Data blocks are medium- grain de-duplication. Blocklets are medium-to fine-grain de-duplication. Bytes are very-fine-grain de-duplication. De-duplication is primarily being deployed for secondary, replicated, or backup data. There are some cases where it is starting to appear in primary storage systems. As an enabling technology for lowering storage costs (data at-rest) and WAN costs (data in-flight), the de-duplication addressable market has the potential to significantly grow beyond the secondary data market. Tiered storage environments are the most likely candidates to deploy data de-duplication as data moves from tier-one to tier-two storage. There is user fear that de-duplication may delete unique data by mistake. If that were to happen, customers would not know about it until an application or a user attempted to INCOMING DATA PREVIOUSLY SENT DATA NEW STORED DATA Figure 7: De-duplication Source: SVB Alliant
  • 24. 22 Prognosticating the Next Generation of Data Protection access the protected, stored data. The chances of this happening are remote. Nevertheless, one data loss can have far-reaching financial consequences for an organization in the era of compliance. There are ways around this potential problem. One method around this issue is to de-duplicate as a background task (Sepaton) previously discussed in the VTL section. Data is first stored in the storage system on a hard disk used as a cache, then the data is de-duplicated. The downside of this approach (nothing is free) is that it requires a bit more capacity to be used as a temporary cache to store all of the data while de-duplication algorithms are applied. The other method is to cleanse or heal the data (Asigra and Quantum/ADIC/Rocksoft). This method constantly checks the data for errors. If an error is discovered, the de-duplication algorithm has the application resend the data. This method essentially cleanses the data and provides autonomous self-healing of the de-duplicated data. Both methodologies ensure that no unique data is deleted accidentally. De-duplication is a very important enabling technology. As an enabler for data protection applications, de-duplication can be an integrated feature of the application or a supplement to older backup and replication products that cannot perform de-duplication themselves. This allows those applications to gain de-duplication benefits without having to rip out and replace an organization’s current data protection software. As with most supplemental or feature adjunct markets, it is essentially just a bandage. As such, this part of the de-duplication market has a limited addressable market, as discussed in the context of VTLs. It is unlikely to counter the open server VTL market decline as backup to disk becomes more pervasive. Stand-alone de-duplication is available from a few vendors (Data Domain, Diligent, ExaGrid, and, to a more limited extent EMC/Avamar). Table 6 provides a brief comparison of those de-duplication products currently available. Continuous Snapshot/ Small Aperture Snapshot Max. Capacity (Terabytes) Max. Throughput in Terabytes/hr WAN Replication Vaulting Diligent ProtecTIER Appliance 1,000 2.9 (cluster of 4) No ExaGrid ExaGrid Server 5 0.5 No Data Domain DD460G 233 4.6 Yes VENDOR PRODUCT Table 6: Select De-duplication Products on the Market Source: SVB Alliant
  • 25. Prognosticating the Next Generation of Data Protection 23 security and encryption— in-flight and at-rest Security is similar to insurance. It is a cost center and does not generate profits for the organization deploying it. And just like insurance, not having it and needing it can be financially devastating. Security has rapidly moved from an IT back- burner issue to a very high priority for everyone up through executive management and even the board of directors. Much of the priority shift can be tied to the spate of high-profile data losses and thefts and the subsequently tighter regulatory environment. The majority of the press on the topic has been about lost, unencrypted backup data containing sensitive personal data valued for identity theft. The increased regulations and legislation, as discussed earlier, are specifying not just the levels of data protection and security required but are also specifying both financial and even criminal penalties for compliance failures. Security and encryption are a fundamental part of data protection. Security means providing the assurance that valuable, protected data will not be taken away or accessed by unauthorized personnel (see the SVB Alliant report “Securing Against the Internal Threat,” August 2006). Many hackers and criminal organizations have recognized protected data as being highly valuable or it would not be protected. TCP/IP WAN FIREWALL Figure 8: Security and Encryption Source: SVB Alliant
  • 26. 24 Prognosticating the Next Generation of Data Protection These hackers and criminal organizations are now focusing their efforts on capturing this type of data as well. Disgruntled employees will often apply their malicious efforts to capture, corrupt, or destroy protected data. This is because their efforts will usually not be detected until the data has to be recovered. Even then it is difficult to identify the culprit without significant forensic analysis. Encryption is no longer just a tape issue, and it is not the only aspect of data storage security. Data protection security must be part of the data life cycle from creation to destruction. IT generally views security as seven primary functions when protecting data: 1. Intruder detection and prevention 2. Allowing authorized access while preventing unauthorized access 3. Data in-flight (tapping) capture prevention 4. Data at-rest capture prevention 5. Audit trail of all access and changes 6. Guaranteed data change prevention 7. Digitally certified data destruction 1. Intruder Detection and Prevention Intruder detection and prevention includes firewalls, viral scanners, and tracing of external access to the private networks. This is considered minimal security and is offered by a wide variety of vendors. 2. Allowing Authorized Access While Preventing Unauthorized Access Authorized access makes sure that only those who have the rights and the privileges to the data can access it. This is usually managed through access control lists (ACLs) or tokens. ACLs also determine the level of data access. A corporate administrator may have read-and-change access, whereas a department administrator may have only read access. ACL user authentication has been driven by login and password. More recently, user authentication has been moving towards biometrics (fingerprints, voice prints, retinal prints, and the like). Authorization security is primarily found in server software (BitArmor, EMC/RSA, Vormetric) and in some cases storage networks (Brocade/McDATA, Cisco). 3. Data In-flight (Tapping) Capture Prevention Preventing data capture while in-flight over WAN starts with the virtual private network (VPN); however, all TCP/IP networks can be tapped or sniffed with nominal hardware and software. This scenario requires encryption in-flight so that even if the data is tapped, it is not readable. Encryption for data in-flight can be located within the application server (BitArmor, EMC/RSA, Vormetric) or encryption appliances at the edge of the WAN (Cipher Optics, Kaman). 4. Data At-rest Capture Prevention Preventing data capture at-rest is the area in which chief information officers are feeling the most heat. Lost backup tapes, flash thumb drives, and laptops are the most concerning. Even primary online data, however, has increasingly become a concern. Once again encryption comes into play. Encryption for primary data may be too much for some application performance requirements with the additional latency of encrypting and unencrypting. Encryption today is an absolutely mandatory requirement for secondary data on tape, optical
  • 27. Prognosticating the Next Generation of Data Protection 25 disc, or hard disk. Encryption for secondary data (a.k.a. backed up or replicated data) can take place in the data protection or on an appliance in front of the tape drive/library, hard disk, or optical disc (MaXXan, NeoScale, Network Appliance/Decru). For primary application data, IT and application administrators must make a decision about the security risk versus the performance degradation. These decisions will be coming for many enterprise organizations on every application. Measuring the security risk will be key. Determining how much data is at risk as well as the value or negative impact of the loss of that data, and evaluating if non-encrypted security efforts will be enough, will vary by application and organization. Encryption at-rest can occur in the application server with a SAN-based key management appliance (BitArmor, EMC/RSA, Ingrian, Vormetric), or the point-based SAN appliance (MaXXan, NeoScale, Network Appliance/Decru). The most important aspect of encryption at-rest is the key management. Key management is not an issue for encryption in-flight because the keys are only temporary. It is a different story for data encryption at-rest. Some data may have storage requirements that span decades. Data retention periods can range from a few years to 30 or more. The Sarbanes-Oxley Act, for example, states that companies must save electronic records and messages (e-mail/instant messages) for at least five years to ensure that auditors and other regulators can easily obtain requested documents. The Basel II Capital Accord requires banks to maintain three to seven years of data history. Toxic exposure — 30 years Records for food (manufacturing, processing, packing) — 2 years after release Records for drugs (manufacturing, processing, packing) — 3 years after distribution Records for bio products (manufacturing, processing, packing) — 5 years after end of manufacture Medical records — Hospital (either original or legally reproduced from) Medical records for minors from birth to 21 possibly life Medical records — 2 years after patient death Financial Statements Member registration for broker/dealers — end-of-life of enterprise Customer account documents – end-of-account plus 6 years Financial and correspondence data — 4 years after audit 1 2 3 4 5 10 YEAR 15 20 25+ Government OSHA Life Sciences/ Pharmaceutical (21 CFR Part 11) Healthcare HIPAA Financial Services (SEC 17(a)-4) Sarbanes-Oxley Figure 9: Record Retention Periods Mandated by Various Regulations in the United States Source: Enterprise Storage Group and SVB Alliant
  • 28. 26 Prognosticating the Next Generation of Data Protection The data encryption keys must be managed and survive for the entire life of that data. If the encryption keys are lost, the data is lost as well. End users are demanding that key management be brain-dead simple. This again comes down to the length of time the encrypted data is stored. There is a high probability that the personnel will change during the time the data is stored. If the key management is difficult and requires training, it is likely the keys could be lost or made inaccessible. The key management issue will make or break an encryption-at-rest solution. 5. Audit Trail of All Access and Changes This aspect of security data protection may seem inherently obvious. It provides the forensic capabilities of tracking users who attempt or succeed at unauthorized access. This capability is akin to video monitoring. It keeps a record in case it is needed. Audit trail capability is considered a must-have feature of any security implementation. It is neither a separate product nor a market. 6. Guaranteed Data Change Prevention This is a relatively new requirement for data protection security. It can be traced to compliance regulations, legislation, and e-discovery. It is especially important for e-discovery, which examines whether an original electronic document has been altered in any way. The outcome of litigation can hang in the balance based on evidence that the original document has or has not been altered. This type of security data protection requires what is known as a cryptographic hash function to authenticate that the data has not changed. This type of authentication is a feature that can be found in content addressable storage also known as CAS (Archivas, Caringo, EMC, Nexsan, Permabit). CAS is a specially modified server file system front ending direct attached storage, SAN storage, or network attached storage (NAS). Another methodology is to embed change security protection directly into each application file system (BitArmor, EMC/RSA). 7. Digitally Certified Data Destruction Just as making sure that data has not been altered is incredibly important, so too is making sure that when data reaches its end of life it is thoroughly and legally destroyed. This requires digital certification that the data has been destroyed. This covers not just the primary data but the secondary or backup data as well. Some innovative vendors have clever ways of providing digitally certified data destruction. The easiest way is to destroy the encryption keys. This makes all copies of the encrypted data unreadable and legally destroyed; however, it does not destroy any unencrypted data. Nor does it destroy backup data that may have copies of the keys. Digitally certified data destruction is provided within the overall security (Bit Armor, EMC/RSA, Vormetric) or backup/replication solution (Asigra). A much more detailed examination of the security market, the vendors, and the solutions can be found in the SVB Alliant report “Securing Against the Internal Threat,” August 2006.
  • 29. Prognosticating the Next Generation of Data Protection 27 market requirements Successful products fulfill market requirements. Market research identifies market requirements while revealing unexpected information. Interviews by Dragon Slayer Consulting with 178 enterprises and small or medium-sized enterprises (SMEs) over the first half of 2006 has brought to light new information that contradicts the experts and much of the conventional wisdom. SMEs are generally defined as businesses under 500 employees. Figure 10 summarizes the results of the SME survey. SME survey comments included: ° Tired of multiple management schemes for different data protection products, especially those requiring partnerships with other vendors and products. ° Willing to go with a less comprehensive data protection scheme to reduce management complexity. ° Looking for point-and-click simplicity that allows assignment of protection levels and technologies by application requirements versus infrastructure limitations. ° Really want a standardized methodology in determining the cost of application outages that allows an easier time of assigning the most cost- effective data protection scheme per application. Dragon Slayer Consulting interviews of 63 small or medium-sized businesses (SMBs) in the first half of 2006 produced somewhat different results. SMBs are generally defined as businesses with more than five and less than 100 employees. Market Research Want to lower the cost of data protection while raising the functionality Prefer disk over tape backup for data protection if priced cost-effectively E-discovery becoming important data protection driver and importance is growing Prefer one data protection system to cover both the data center and the ROBO Want data protection to guarantee that data is recoverable Worried that current data protection is not effectively protecting ROBO and could be non-compliant Want vendors to correct situation quickly and are exploring alternatives Security is currently one of the highest priorities for data protection Replication of primary data is an equally high priority De-duplication can significantly reduce data protection WAN and storage cost Demanding simpler, easier-to-use data protection technology Demanding management of the data protection (even automatic) that requires no user training Single management for all data protection (VTL, CDP, backup disk or tape, archival, security, RPO/RTO) Would like to assign the level of data protection by application 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 90% 80% 70% 60% 50% 40% 30% 20% 10% AGREE OR STRONGLY AGREE Figure 10: 2006 U.S. SME Data Protection Survey Source: Dragon Slayer Consulting
  • 30. 28 Prognosticating the Next Generation of Data Protection Comments included: ° Simplifying data protection is important, and any product that does gets a look. ° Really want a single product, single interface, single training for all data protection needs. what the numbers mean First and foremost the enterprise and SME markets are very similar, whereas the SMB market is markedly different. ° The enterprise and the SME are willing to entertain different levels of data protection and technologies that are assignable by application. The SMB is for the most part not that sophisticated and is looking for a single product for all of its data protection needs. Whereas the enterprise and the SME would prefer a single product, they want a product that incorporates sophisticated features and multiple technologies that give them flexibility. The SMB is quite happy to have one technology where all data is treated equally. ° The appeal of tape is rapidly declining as a primary data protection medium for the enterprise, SME, and SMB as disk based solutions become more cost-effective per gigabyte. ° Service providers have greater appeal to the SMB than to the enterprise or SME. The enterprise and the SME are more likely to offer data protection as an internal chargeable service. ° Security is in the forefront of the minds of the enterprise and the SME. Many are planning to do something within the next 24 to 30 months. 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% AGREE OR STRONGLY AGREE Don’t differentiate application data (treat all the same) Prefer disk over tape backup for data protection if priced cost-effectively Lack data protection administration skills and want brain-dead simple data protection Service provider at right price would be attractive alternative for data protection Storage security not a high priority at this time and foresee no change soon Figure 11: 2006 U.S. SMB Data Protection Survey Source: Dragon Slayer Consulting Security for the SMB has not risen above the firewall and virus-scanning levels and will not for the foreseeable future. ° For the enterprise and the SME, ROBO data protection is becoming a major issue that must be resolved for compliance. It is a nonissue for the SMB. ° All three markets are becoming much more cost sensitive to their data protection solutions and are demanding greater simplicity in implementation, operations, and management.
  • 31. Prognosticating the Next Generation of Data Protection 29 Predicting the future can be a precarious and risky proposition; nevertheless, there are some logical conclusions that can be drawn from past experience, current trends, legislation, regulation, market research, and educated guesswork. 1. Narrowly focused feature markets will become part of more comprehensive data protection products. This has already started. CDP is a zero-cost feature of Asigra’s Televaulting distributed backup product (agentless). CDP is a feature of CommVault’s QiNetix product suite (utilizing the same agent and database as its backup, replication, storage resource management, and software monitoring). CDP is also a feature of Symantec’s Backup Exec 10d product line (utilizing the same agent and database as the backup software). Security in the form of ACLs, encryption in-flight, and encryption at-rest now appear as options in data protection software from Atempo, Asigra (no charge), EMC/Legato, EVault, IBM/TSM, Iron Mountain, and Symantec. We believe that point solution vendors will continue to be bought by larger players and do not have large enough end markets to survive on their own. 2. Some Band-Aid product markets will most likely fade away or become insignificant. Despite some industry analyst views on VTLs’ becoming a multibillion-dollar market, and while we recognize that the tape market is huge and that it will take companies years to transition from tape to disk, we question the viability of the stand- alone VTL vendor. We believe that VTLs need to be part of a more broad-based platform. There are a couple of reasons why we take this position. First, we believe that storage vendors with large installed bases of tape customers will want to protect their turf by providing multiple backup and archive solutions beyond just partnerships. Second, because we believe that the open server VTL market appears to be one of those in which the value proposition declines over time—the exception being the mixed open server and mainframe VTL markets—a stand-alone company will need to quickly innovate, add new products or re-create itself. 3. Security will become table stakes in enterprise and SME data protection products. EMC’s acquisition of RSA, Network Appliance’s acquisition of Decru, Symantec’s acquisition of VERITAS, and the increasing number of data protection products with integrated security are indicative of the increasing importance of security. 4. Enterprises, SMEs, and SMBs will require all data protection products to become more sophisticated and automated. Sophisticated does not mean more complicated to the user. It generally means less complicated and easier to use while providing more functionality with increased automation. An example of this automation will be the calculation of application outage costs. Determining the real organizational cost per application outage is very important in figuring out the level of data protection that is required for that application. Although it is a simple standardized cost formula (see next page), most end users do not know how to calculate an application outage cost to determine RPO, RTO, and technology to utilize. NGDP Direction over the Next Three to Five Years
  • 32. 30 Prognosticating the Next Generation of Data Protection Cost per Application Outage = (RPO + RTO) × (HR + LR) × Length of Outage in hours ° RPO = recovery point objective, or the amount of data that can be lost per application per outage ° RTO = recovery time objective, or the time it takes to be back in operation per application per outage ° HR = lost worker productivity per hour of downtime, or the cost per hour of the nonproductive worker ° LR = lost revenue per hour of downtime Calculating the cost per application outage must become an easy automated tool for end users, enabling them to make knowledgeable, informed decisions and shortening the vendors’ sales cycles. 5. ROBOwillbeanintegralpartofdataprotection products. It will no longer be financially practical to relegate the ROBO data to the same practices as data center data because of the increasing costs and the decreasing remote skills. This means that data protection products must take into account ROBO issues, including bandwidth and once again security. With numerous emerging ROBO products encroaching on traditional data center functionality,weanticipateafuturewhereproducts do both; we expect large vendors to seek growth in the ROBO market and it will become a make- versus-buy decision. 6. E-discovery will push more data online vs. offline increasing the decline of tape. Time is money—and never more so than with attorneys. The United States and Europe are litigious societies, with litigation increasing every year. It is logical to conclude that protected and archival data that has traditionally been offline will become increasingly online (on disk versus tape). And it will have extensive fast-search capabilities. 7. The majority of data protection products will be disk based with tape, being used primarily for long-term archival storage. Disk is faster for all types of recoveries, such as CDP. The increasing requirements for speed of recovery, convenience, lower cost of media, and de-duplication will drive this trend. 8. De-duplication will become integral to data protection products and not an add-on. De-duplication is already part of the distributed ROBO backup offerings from Asigra, EMC/ Avamar, and Symantec. The hardware costs, savings, and bandwidth reduction for ROBO are very compelling and will become table stakes within the next three to five years. De- duplication is also part of the VTL offerings from Data Domain, Diligent, and FalconStor (COPAN, EMC, HP, IBM) with Quantum/ ADIC/Rocksoft planning a wide variety of D2D products leveraging its patented Rocksoft blocklet de-duplication technology. Those who don’t have de-duplication have at a long-term disadvantage and be potentially noncompetitive.
  • 33. Prognosticating the Next Generation of Data Protection 31 The data protection vendors will have to develop and integrate this broadening base of functionality by themselves, acquire it, or be acquired. This is a classic build-versus-buy decision. Conventional wisdom is that if the vendor cannot develop all of the functionality required by the market, it should consider strategic partnerships. That has been the historic roadmap, and the logic is that if it worked in the past, it should work in the present. Like most conventional wisdom, it is correct for the past and not necessarily the present or future. NGDP strategic partnering requires multiple licenses, multiple management touch points or interfaces, multiple maintenance/ subscription agreements, multiple agents (each using a couple of percentage points of CPU processing power), broader administration technical skills, and much more training. This is contradictory to the IT organization’s intensifying trend of reduced skills, doing more with less, increasing responsibility with fewer people, and a focused effort on making things easier. We believe that users are going to demand that vendors provide a suite versus a point product for their NGDP solutions. Therefore, the most viable alternative is for the NGDP market to consolidate. Failure to keep up with market requirements is a recipe for failure and will relegate many current and leading vendors to historical footnotes. As with most maturing markets, there is a high likelihood of increased MA activity. Momentum and acceleration occurring over the next three to five years as the market leaders faced with the innovator’s dilemma have difficulty offering NGDP market requirements and it becomes increasingly difficult for startup vendors to offer enough to break out from the pack. Conclusions
  • 34. 32 Prognosticating the Next Generation of Data Protection Table7:SelectStorageIndustryMATransactions DataasofJanuary10,2007 Transactionvaluereflectsadjustmentforcashanddebtfrompurchaseprice. Sources:CapitalIQ,The451Group,SVBAlliant,companypressreleases andWebsites,andmiscellaneousnewsarticles LTM Last12months N/A Notapplicable LEGEND Announce DateAcquirerTargetTargetBusinessDescriptionSubsector Transaction Value Target LTM Revenue Revenue Multiple 12/21/06SeagateEvaultProvidescontinuousdataprotectionandrecoverysoftwareCDP$185.0$35.05.3x 11/27/06SymantecRevivioProvidesenterprise-classcontinuousdataprotectionandrecoverysoftwareCDPN/AN/AN/A 11/08/06NetworkApplianceTopioProvidesdisasterrecoverydatareplicationandmigrationsoftwareReplication160.07.521.3x 11/01/06EMCAvamarDevelopsenterprisedataprotectionsoftwaresolutionsDe-duplication/ ROBO 165.022.07.5x 10/25/06LSILogicStoreAgeProvidesSANstoragemanagementandadvanced,multi-tiered,dataprotectionsolutionsCDP50.0~5.010.0x 09/25/06GlobalSCAPEAvailProvidessoftwaresolutionstodeliverremote-sitefilesharing,continuousdataanddatabasebackup,and acceleration SAS/backup9.7N/AN/A 07/11/06CAXOsoftProvidescontinuousapplicationandinformationsolutionsCDP80.015.05.3x 05/09/06EMCKashyaProvidesdisasterrecovery,continuousremotereplication,andcontinuousdataprotectionsolutions acrossvariousstorageareanetworkingenvironments Replication153.06.025.5x 05/02/06QuantumADICSupplierofautomatedtapesystems,datamanagementsoftware,storagenetworkingappliances,disk- basedbackupandrestoresolutions Back-up513.0472.81.1x 04/12/06CrossroadsSystemsTapeLaboratoriesSolutionsincludedatacompression,tapearrays,interfaceemulation,anddisk-basedvirtualtapesolutionsVTLN/AN/AN/A 03/15/06ADICRocksoftProvidesredundant-dataeliminationtechnologiesandproducts,includingblocklets,andasoftware developmentkit De-duplication65.0Pre- Revenue N/A 03/06/06AtempoStoractiveProvidescontentdataprotectionsoftwareforMicrosoftWindows;solutionsincludeLiveBackup,a client/server,real-timebackupsoftwaretodeliverautomaticdatabackup,andend-userfilerecovery CDPN/A5.0N/A 12/01/05IronMountainLiveVaultProvidesdisk-to-diskbackupanddatarecoveryservices;offersLiveVaultInSync,atape-freeserverbackup andrecoveryservicethatprovidesautomaticbackup,offsitedatastorage,restoredata,andprotectionof opendatabasesandfiles Backup/ recovery 50.010.05.0x 11/21/05SonicWALLLassoLogicProvidesopticaldiscdataprotectionsystems;includingacontinuousdataprotectionappliance,anda tapebackupreplacementsolution;additionally,providesreal-timecontinuousdataprotectionforservers, laptops,andpersonalcomputerslocallyandoffsite CDP20.02.010.0x 11/16/05BakBoneSoftwareConstantDataProvidesdataintegration,protection,andstoragemanagement;alsodevelopsdatareplicationand clusteringsoftwareforheterogeneousserverandoperatingenvironments Replication5.5N/AN/A 08/17/05SeagateMirraNetworkeddigitalcontentprotectionandbackupproducts;offersMirraPersonalServer,apersonal computerbackupsystemtobackup,access,andsharedigitalfiles Backup15.0N/AN/A 08/08/05OverlandStorageZettaSystemsOffersZettaServerIR,whichvirtualizesthephysicalstorageattachedtoitandpresentsitasblocklevel; andZettaServerNAS,aWeb-baseduserinterfaceusedtoaccessthesnapshotfilesthroughhiddenfolders SAS9.0Pre- revenue N/A 04/20/05VERITASDataCenterTechnologiesProvidesremotebackupapplicationsviaacontentaddressablestorageengineDatareduction60.01/240.0x 04/07/05NetworkApplianceAlacritusSoftwareVirtualtapelibraryandCDPbackupsoftware;offeringsincludecontinuousdataprotectiontechnology andopensystemsvirtualtapelibraryappliancesoftware VTL/CDP11.0N/AN/A 10/12/04EMCDantzDevelopmentProductsprotectcomputersbyprovidingbackupandrecoveryforfileservers,desktops,notebooks,and business-criticalapplicationsforSMBs Backup/ recovery 45.0N/AN/A 10/12/04IronMountainConnectedCorporationArchiving,dataprotection,servicesandsoftwarethatsupportpersonalcomputers,overtheInternetand oncorporateintranets Backup services 117.036.03.3x 03/30/04MendocinoSoftwareVyantTechnologiesOffersincludedatastorage,backup,replicationandprofessionalservicesDataprotection/ replication N/AN/AN/A Mean12.2x Median7.5x ($inmillions,exceptpersharevalue)
  • 35. Prognosticating the Next Generation of Data Protection 33 TradingPerformanceFullyDilutedEnterpriseValueMultiples Current Price 1/10/07 52Week High 52Week Low LTMPrice Change Equity Market Value Enterprise Value EV/ Revenue CY05A EV/ EBITA CY05A EV/ Earnings CY05A Revenue CY05A Cash CY05A Debt CY05A BakBoneSoftware$1.40$2.90$0.95(12%)$ 103$ 84N/AN/AN/AN/A$ 18.7$ 0.0 Brocade8.549.424.12992,3391,7973.1x18.3x72.4x582.6542.10.0 CA24.8429.5018.97(14)13,22214,5153.8x13.3x59.0x3,804.01,295.02,588.0 CommVault19.2920.7414.74N/A8868468.1x69.8x79.0x104.350.210.0 EMC14.3614.759.44832,36531,8953.3x14.4x28.1x9,664.02,669.52,200.0 FalconStor8.239.805.9904243859.4x73.9xN/M41.039.00.0 HP42.2042.3929.0037122,253111,0261.3x14.3x41.4x87,901.016,422.05,195.0 IBM98.89100.3372.7318153,873164,9631.8x9.5x20.8x91,134.010,901.021,991.0 IronMountain27.3129.9122.64(2)5,4248,0133.9x14.1x72.1x2,078.245.42,634.8 LSILogic9.4411.817.4123,8083,1621.6x11.2xN/M1,919.21,268.1622.0 Microsoft29.6630.2621.4610308,909280,6576.8x15.6x21.5x41,359.028,252.00.0 NetworkAppliance39.7341.5625.853915,92414,7387.7x38.4x54.5x1,920.31,379.4193.4 OverlandStorage4.3411.323.63(55)56100.0xN/MN/M232.845.80.0 Quantum2.374.021.90(24)4609661.1x23.5xN/M868.6150.9656.5 Seagate26.8128.1119.151515,85515,6701.8x9.9x14.7x8,536.02,653.02,468.0 SunMicrosystems6.006.253.743321,36717,9801.5x28.8xN//M11,664.03,971.0584.0 Symantec21.4622.1914.781220,42219,5685.4x16.6xN/M3,617.52,954.22,100.0 Median3.2x15.6x47.9x Mean3.8x24.8x46.3x DataasofJanuary10,2007 Sources:CapitalIQ,The451Group,SVBAlliant,companypressreleasesandWebsites,andmiscellaneousnewsarticles A Annual CY Calendaryear EBITDA Earningsbeforeinterest,depreciation,amortization,andtaxes EV Marketcapitalizationlesscashandlongtermdebt LTM Last12months N/A Notapplicable N/M Notmeaningful LEGEND Table8:SelectStorageIndustryCompanyComparables($inmillions,exceptpersharevalue)
  • 36. 34 Prognosticating the Next Generation of Data Protection ° ACLs access control lists ° bots servers that crawl the Internet looking for content of interest based on the search parameters; they can search for security flaws in a system and then catalog and report them, or exploit them ° CAS content addressable storage ° CDP continuous data protection ° CDT continuous data technology ° CFR Code of Federal Regulations ° CPU central processing unit ° D2D disk-to-disk ° DR disaster recovery ° e-discovery electronic data discovery ° EU European Union ° EU DPD European Union Data Protection Directive ° FICON fiber connectivity ° GSIS global single instance storage ° GLBA U.S. Gramm-Leach-Bliley Act ° HDD hard disk drive ° HIPAA Health Insurance Portability and Accountability Act ° HR lost worker productivity per hour of downtime ° ILM information life cycle management ° IM instant messaging ° malware viruses, worms, bots, keystroke mappers ° NAS network attached storage ° nearline storage user and application accessible storage with lower performance than online ° NGDP next generation data protection ° OSHA Occupational Safety and Health Administration ° PC personal computer ° ROBO remote office branch office ° RPO recovery point objective ° RTO recovery time objective ° SAN storage area network ° SATA serial advanced technology attachment ° SAS small aperture snapshot ° SIS single instance storage ° SMB small or medium-sized business ° SME small or medium-sized enterprise ° SNIA Storage Network Industry Association ° TB terabyte ° TCP/IP Transmission Control Protocol/Internet Protocol ° VPN virtual private network ° VTL virtual tape library ° WAN wide area network Table 10: Acronyms and Abbreviations Source: Dragon Slayer Consulting 2006 SME Survey Results 178 surveyed Issue Would like to assign the level of data protection by application Single mgmt all data protection (VTL, CDP, backup disk or tape, archival, security, RPO/ RTO, etc.) Intuitive management of the data protection even automatic that requires no user training Demanding simpler, easier to use data protection technology De-duplication can significantly reduce data protection WAN and storage cost Replication of primary data an equally high priority Security is currently one of their highest priorities for data protection Want their vendors to quickly correct situation quickly are exploring alternatives Worried current data protection not effectively protecting ROBO could non-compliant Want their data protection to guarantee data is recoverability Prefer one data protection system to cover both the data center and the ROBO e-discovery becoming important data protection driver importance growing Prefer disk over tape backup for data protection if priced cost effectively Want to lower the cost of data protection while raising the functionality 2006 SMB Survey Results 63 surveyed Issue Storage security not high priority at this time foresee no change soon Service provider at right price would be attractive alternative for data protection Lack data protection admin skills want brain-dead simple data protection Prefer disk over tape backup for data protection if priced cost effectively Don't differentiate application data (treat all the same) Strongly Agreed or Agreed 87 87 94 94 94 101 119 123 123 130 139 139 144 144 Strongly Agreed or Agreed 43 47 56 57 59 Percentage 49% 49% 53% 53% 53% 57% 67% 69% 69% 73% 78% 78% 81% 81% Percentage 68% 75% 89% 90% 94% Table 9: Interview Tables
  • 37. Prognosticating the Next Generation of Data Protection 35 Select SVB Alliant Transactions has been acquired by November 2005 has been acquired by January 2005 July 2006 has been acquired by has been acquired by June 2005 has been acquired by April 2005 VERITAS (now Symantec) October 2006 has been acquired by October 2006 has been acquired by November 2006 has been acquired by has been acquired by April 2001 $13,000,000 Series B Preferred Stock December 2005 August 2005 has been acquired by $12,000,000 Series D Preferred Stock May 2006
  • 38. 36 Prognosticating the Next Generation of Data Protection SVB Alliant is an investment banking firm providing MA and private capital advisory services to technology and life science companies. SVB Alliant’s expertise spans the technology landscape, with deep subject-matter and execution experience in semiconductors, communications, storage, security, networking, peripherals and capital equipment, the Internet, software and services, and life sciences. The firm has offices in Palo Alto, California, and Boston and an affiliate in London. SVB Alliant is a member of global financial services firm SVB Financial Group, with SVB Silicon Valley Bank, SVB Analytics, SVB Capital, SVB Global, and SVB Private Client Services. Additional information is available at www.svballiant.com About SVB Alliant If you would like more information about the NGDP market, please contact: Rick Dalton SVB Alliant Phone: 650.330.3799 E-mail: rdalton@svballiant.com Melody Jones SVB Alliant Phone: 650.330.3076 E-mail: mjones@svballiant.com Contact Information References 1. Financial Services Modernization Act, Gramm-Leach-Bliley, United States Senate, http://www.cdt.org/privacy/eudirective/EU_Directive_.html. 2. European Union Data Protective Directive, European Union, http://banking.senate.gov/conf/grmleach.htm.
  • 39. SVB Alliant, as part of its business, is regularly engaged in providing MA and private placement advisory services to technology and life science companies. We may have in the past and may currently or in the future provide such services for a transaction-based fee to one or more of the companies mentioned in this piece. These securities offered have not been registered under the U.S. Securities Act of 1933, as amended, and may not be offered or sold in the United States absent registration or an applicable exemption from registration requirements. This material, including without limitation the statistical information herein, is provided for informational purposes only. The material, including all forward-looking projections, is based in part on information from third-party sources that we believe to be reliable, but neither the material nor the sources have been independently verified by us. As a result, we do not represent that the information is accurate or complete. Nothing relating to the material should be interpreted as a recommendation or solicitation or offer to buy or sell the securities of the companies mentioned herein. SVB Alliant is a wholly owned broker-dealer subsidiary of SVB Financial Group, the parent company of Silicon Valley Bank. Member NASD/SIPC. SVB Alliant’s services are not bank products or services. The services of SVB Alliant are not guaranteed by the bank, are not FDIC-insured and may lose value. SVB Alliant Europe Ltd. is registered in England and Wales at 34 Dover Street, London, W1S 4NG, U.K. under No. 5089363 and is authorised and regulated by the Financial Services Authority. All material presented, unless specifically indicated otherwise, is under copyright to SVB Alliant and its affiliates and is for informational purposes only. None of the material, nor its content, nor any copy of it, may be altered in any way, transmitted to, copied, or distributed to any other party without the prior express written permission of SVB Alliant. All trademarks, service marks, and logos used in this material are trademarks, service marks, or registered trademarks of SVB Alliant or one of its affiliates or other entities. © 2006 SVB Alliant SM . All rights reserved.
  • 40. SVB Alliant Headquarters 181 Lytton Avenue Palo Alto, California 94301 U.S. SVB Alliant Boston 2221 Washington Street One Newton Executive Park, Suite 200 Newton, Massachusetts 02462 U.S. SVB Alliant Europe Ltd. London 34 Dover Street 5th Floor London W1S 4NG U.K.