Data Virtualization:
Revolutionizing data cloning
a.k.a. copy data management
1
kylehailey.com kyle@delphix.com @datavirt
Data virtualization
• Fast becoming the new norm
• Used by Over 100 of Fortune 500
• Enables DevOps
DevOps movement
• Goals Clarify
• Metrics Define
• Constraints Identify
• Priorities Set
• Iterations Fast
DevOps :
• Goals Clarify
• Metrics Define
• Constraints Identify
• Priorities Set
• Iterations Fast
• Continuous Integration
• Cloud
• Agile
• Kanban
• Kata
“IT is the factory floor of this century”
The Goal : Theory of Constraints
Improvement
not made
at the constraint
is an illusion
factory floor optimization
Factory floor
Factory floor
constraint
Not a relay race
Tune before constraint
constraint
Tuning here
Stock piling
Tune after constraint
constraint
Tuning here
Starvation
Factory floor : straight forward
constraint
Goal: find constraint
optimize it
The Phoenix Project
What is the
constraint
in IT ?
Put your energy into the constraint
Top 5 constraints in IT
1. Dev environments setup
2. QA setup
3. Code Architecture
4. Development
5. Product management
- Gene Kim
“One of the most powerful
things that organizations
can do is to enable development
and testing to get
environment they need when
they need it“
Data is the constraint
60% Projects Over Schedule
85% delayed waiting for data
Data is the Constraint
CIO Magazine Survey:
only getting worse
Gartner: Data Doomsday, by 2017 1/3rd IT in crisis
• Data Constraint
• Solution
• Use Cases
In this presentation :
Typical Architecture
Production
Instance
File system
Database
Typical Architecture
Production
Instance
Backup
File system
Database
File system
Database
Typical Architecture
Production
Instance
Reporting Backup
File system
Database
Instance
File system
Database
File system
Database
Typical Architecture
Production
Instance
File system
Database
Instance
File system
Database
File system
Database
File system
Database
Instance
Instance
Instance
File system
Database
File system
Database
Dev, QA, UAT Reporting Backup
Triple Tax
Typical Architecture
Production
Instance
File system
Database
Instance
File system
Database
File system
Database
File system
Database
Instance
Instance
Instance
File system
Database
File system
Database
Typical Architecture
Production
Instance
File system
Database
Instance
File system
Database
File system
Database
File system
Database
Instance
Instance
Instance
File system
Database
File system
Database
Copies
21
• Oracle customers : 8-12 copies per db
• Fortune 2K: 1000s multi-TB db
• Downstream storage staggering
- 3 petabytes at just one client
• Hardware
– storage, systems, network,
– rack space, power cooling
• People
– 1000s hours per year just for DBAs
– DBAs
– SYS Admin
– Storage Admin
– Backup Admin
– Network Admin
• $10s Millions for data center modernizations
Copies require People & Time
companies unaware
companies unaware
Developer or AnalystBoss, Storage Admin, DBA
Metrics
–Time
–Old Data
–Storage
Other
–Analysts
–Audits
–Data Center Modernization
companies unaware
"we say no, no, no until we can't say no anymore"
response when IT asked for copies of prod DB
1. Waiting to check in code
2. Production Bugs
3. Expensive Slow QA
Biggest problem in Application Development
Development : bottlenecks
Frustration Waiting
Development : Bugs
Old Unrepresentative Data
Development : subsets
False Negatives
False Positives
Bugs in Production
Production Wall
30
QA : Long setup times
BugX
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7
Delay in Fixing the bug
Cost
To
Correct
Software Engineering Economics
– Barry Boehm (1981)
QA : destructive tests refresh time
32
20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST
8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs 8 Hrs
• Data Constraint
• Solution
• Use Cases
In this presentation :
Development UATQA
99% of blocks are identical
Solution
Development QA UAT
Thin Clone
• EMC Symmetrix
– 16 snapshots
– Write performance impact
– No snapshots of snapshots
• Netapp & EMC VNX
– 255 snapshots
• ZFS
– Compression
– Unlimited snapshots
– Snapshots of Snapshots
• DxFS
– Compression
– Unlimited snapshots
– Snapshots of Snapshots
– Shared cache in memory
Technology Core : file system snapshots
Also check out new SSD storage such as: Pure Storage, EMC XtremIO
Snapshot 1 – full backup once only at link time
Jonathan Lewis © 2013
Virtual DB
38 / 30
a b c d e f g h i
We start with a full backup - analogous to a level 0 rman backup. Includes
the archived redo log files needed for recovery. Run in archivelog mode.
Snapshot 2 (from SCN)
Jonathan Lewis © 2013
b' c'
a b c d e f g h i
The "backup from SCN" is analogous to a level 1
incremental backup (which includes the relevant
archived redo logs). Sensible to enable BCT.
Delphix executes
standard rman scripts
Apply Snapshot 2
Jonathan Lewis © 2013
a b c d e f g h ib' c'
The Delphix appliance unpacks the rman backup and "overwrites" the
initial backup with the changed blocks - but DxFS makes new copies of
the blocks
Drop Snapshot 1
Jonathan Lewis © 2013
b' c'a d e f g h i
The call to rman leaves us with a new level 0 backup, waiting for recovery.
But we can pick the snapshot root block. We have EVERY level 0 backup
Creating a vDB
Jonathan Lewis © 2013
b' c'a d e f g h i
The first step in creating a vDB is to take a snapshot of the filesystem as at
the backup you want (then roll it forward)
My vDB
(filesystem)
Your vDB
(filesystem)
b' c'a d e f g h i
Creating a vDB
Jonathan Lewis
© 2013
b' c'a d e f g h i
The first step in creating a vDB is to take a snapshot of the filesystem as at
the backup you want (then roll it forward)
My vDB
(filesystem)
Your vDB
(filesystem)
i’b' c'a d e f g h ib' c'a d e f g h i
Fuel not equal car
Challenges
1. Technical
2. Bureaucracy
Bureaucracy
Developer Asks for DB Get Access
Manager approves
DBA Request
system
Setup DB
System
Admin
Request
storage
Setup
machine
Storage
Admin
Allocate
storage
(take snapshot)
Why are hand offs so expensive?
1hour
1 day
9 days
Bureaucracy
Technical Challenge
Database
Luns
Production Filer
Target A
Target B
Target C
snapshot
clones
InstanceInstance
InstanceInstance
InstanceInstance
InstanceInstance
Instance
Source
Database
LUNs
snapshot
clonesProduction Filer
Development Filer
Technical Challenge
Instance
Target A
Target B
Target C
InstanceInstance
InstanceInstance
InstanceInstance
Instance
Technical Challenge
Copy
Time Flow
Purge
Production
File System Instance
TargetStorage
Clone (snapshot)
Compress
Share Cache
Provision
Mount, recover, rename
Self Service, Roles & Security
Instance
21 3
How to get a Data Virtualization?
Source
sync
Target
Deploy
Storage
snapshots
21 3
Source Sync Storage Snapshots Deploy automation
ZFS Yes (unlimited)
EMC SRDF Yes (16 or 255)
Netapp SMO Yes (255)
Oracle EM 12c Data Guard Netapp, ZFS Yes (oracle only, no
branching)
Actifio Yes Yes Yes (no branching)
Delphix Yes Yes yes
Actifio
Production
InstanceInstanceInstance
Actifio
InstanceInstance Instance
TargetActifio
Instance
Target
Oracle Snap Clone
ZFSSA
or
NetApp
Instance
TargetEM 12c
Instance
Target
Production
InstanceInstanceInstance
Oracle Snap Clone
Production
InstanceInstanceInstance
Data Guard
InstanceInstanceInstance
ZFSSA
or
NetApp
Instance
TargetEM 12c
Instance
Target
Oracle Snap Clone
Production
InstanceInstanceInstance EM 12c
Solaris
ZFS
Instance
TargetData Guard
Instance
Instance
Target
Any
storage
Incremental forever collect changes
Production
InstanceInstanceInstance
Time Flow
Changes
Instance
NFS
Target
Instance
Target
Database Virtualization
Three Physical Copies
Three Virtual Copies
Data
Virtualization
Appliance
Before Virtual Data
Production Dev, QA, UAT
Instance
Reporting Backup
File system
Database
Instance
File system
Database
File system
Database
File system
Database
Instance
Instance
Instance
File system
Database
File system
Database
“triple data
tax”
With Virtual Data
Production
Instance
Dev & QA
Instance
Reporting
Instance
Backup
Instance Instance Instance
InstanceInstance
Instance
File system
Database
Data
Virtualization
Appliance
Instance
• Problem in the Industry
• Solution
• Use Cases
1. Development and QA
2. Production Support
3. Business
Use Cases
1. Development & QA
2. Production Support
3. Business
Use Cases
Development: Virtual Data
Development
* Fast * Free * Full size * Self service
Virtual Data: Easy
Instance
Instance
Instance
Instance
Source
DVA
Development Virtual Data: Parallelize
gif by Steve Karam
Development Virtual Data: Full size
Development Virtual Data: Self Service
QA : Virtual Data
• Fast
• Parallel
• Rollback
• A/B testing
Dev
QA
Instance
Prod
DVA
• Eliminate build time
• Find bugs Fast
• Run Parallel QA
QA Virtual Data : Parallel
Production Time Flow
QA Virtual Data : Fast Refresh
70
20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST
• Fast
• Full
• Fresh
• Efficient
8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs 8 Hrs
20 MIN
TEST
QA with Virtual Data: Rewind
DVAInstance
QA
Prod
Production Time Flow
QA with Virtual Data: A/B
DVAInstance
Instance
Instance
Index 1
Index 2
Production Time Flow
Data Version Control
1/30/2015 73
Dev
QA
2.1
Dev
QA
2.2
2.1 2.2
Instance
Prod
DVA Production Time Flow
1. Development and QA
2. Production Support
3. Business
Use Cases
• Backups
• Recovery
• Forensics
• Migration
• Consolidation
Production Support
9TB database 1TB change day
30 day backups storage requirements
76
0
10
20
30
40
50
60
70
week1
week2
week3
week4
original
Oracle
Delphix
Recovery
Instance
Instance
Recover VDB
Drop
Source
DVA Production Time Flow
Forensics
Instance
Development
DVA
Source
Production Time Flow
Development (the new production)
Instance
Development
DVA
Source
Development
Prod & VDB Time Flow
Migration
1. Development and QA
2. Production Support
3. Business Intelligence
Use Cases
Business Intelligence
• ETL
• Temporal
• Confidence Testing
• Federated Databases
• Audits
Business Intelligence: ETL and Refresh Windows
1pm 10pm 8am
noon
Business Intelligence: batch taking too long
1pm 10pm 8am
noon
2011
2012
2013
2014
2015
2011
2012
2013
2014
2015
1pm 10pm 8am
noon
10pm 8am noon 9pm
6am 8am 10pm
Business Intelligence: ETL and DW Refreshes
Instance
Prod
Instance
DW & BI
• Collect only Changes
• Refresh in minutes
Instance
Prod
BI and DW
ETL
24x7
DVA
Virtual Data: Fast Refreshes
Production Time Flow
Temporal Data
Confidence testing
Modernization: Federated
Instance
Instance
Source1
Source2
DVA
Production Time Flow 1
Production Time Flow 2
Modernization: Federated
“I looked like a hero”
Tony Young, CIO Informatica
Modernization: Federated
Production Time Flow
Audit
1/30/2015 93
Instance
Prod
DVA
Live Archive
1. Development & QA
2. Production Support
3. Business
Use Case Summary
How expensive is the Data Constraint?
DVA at Fortune 500 :
Dev throughput increase by 2x
Faster
• Financial Close
• BI refreshes
• Surgical recovery
• Projects
How expensive is the Data Constraint?
• Projects “12 months to 6 months.”
– New York Life
• Insurance product “about 50 days ... to about 23 days”
– Presbyterian Health
• “Can't imagine working without it”
– State of California
Virtual Data Quotes
• Problem: Data is the constraint
• Solution: Virtualize Data
• Results:
• Half the time for projects
• Higher quality
• Increase revenue
Summary
Thank you!
• Kyle Hailey| Oracle ACE and Technical
Evangelist, Delphix
– Kyle@delphix.com
– kylehailey.com
– slideshare.net/khailey
– @datavirt

Data Virtualization: Revolutionizing data cloning