Data Virtualization: Revolutionizing data cloning

Data Virtualization:
Revolutionizing data cloning
a.k.a. copy data management
1
kylehailey.com kyle@delphix.com @datavirt

Data virtualization
• Fast becoming the new norm
• Used by Over 100 of Fortune 500
• Enables DevOps

DevOps movement
• Goals Clarify
• Metrics Define
• Constraints Identify
• Priorities Set
• Iterations Fast

DevOps :
• Goals Clarify
• Metrics Define
• Constraints Identify
• Priorities Set
• Iterations Fast
• Continuous Integration
• Cloud
• Agile
• Kanban
• Kata
“IT is the factory floor of this century”

The Goal : Theory of Constraints
Improvement
not made
at the constraint
is an illusion
factory floor optimization

Factory floor
constraint
Not a relay race

Tune before constraint
constraint
Tuning here
Stock piling

Tune after constraint
constraint
Tuning here
Starvation

Factory floor : straight forward
constraint
Goal: find constraint
optimize it

The Phoenix Project
What is the
constraint
in IT ?

Put your energy into the constraint
Top 5 constraints in IT
1. Dev environments setup
2. QA setup
3. Code Architecture
4. Development
5. Product management
- Gene Kim
“One of the most powerful
things that organizations
can do is to enable development
and testing to get
environment they need when
they need it“

Data is the constraint
60% Projects Over Schedule
85% delayed waiting for data
Data is the Constraint
CIO Magazine Survey:
only getting worse
Gartner: Data Doomsday, by 2017 1/3rd IT in crisis

• Data Constraint
• Solution
• Use Cases
In this presentation :

Typical Architecture
Production
Instance
File system
Database

Production
Instance
Backup
File system
Database
File system
Database

Production
Instance
Reporting Backup
File system
Database
Instance
File system
Database
File system
Database

Production
Instance
File system
Database
Instance
File system
Database
File system
Database
File system
Database
Instance
Instance
Instance
File system
Database
File system
Database
Dev, QA, UAT Reporting Backup
Triple Tax

Production
Instance
File system
Database
Instance
File system
Database
File system
Database
File system
Database
Instance
Instance
Instance
File system
Database
File system
Database

Copies
21
• Oracle customers : 8-12 copies per db
• Fortune 2K: 1000s multi-TB db
• Downstream storage staggering
- 3 petabytes at just one client

• Hardware
– storage, systems, network,
– rack space, power cooling
• People
– 1000s hours per year just for DBAs
– DBAs
– SYS Admin
– Storage Admin
– Backup Admin
– Network Admin
• $10s Millions for data center modernizations
Copies require People & Time

companies unaware
Developer or AnalystBoss, Storage Admin, DBA

Metrics
–Time
–Old Data
–Storage
Other
–Analysts
–Audits
–Data Center Modernization
companies unaware
"we say no, no, no until we can't say no anymore"
response when IT asked for copies of prod DB

1. Waiting to check in code
2. Production Bugs
3. Expensive Slow QA
Biggest problem in Application Development

Development : bottlenecks
Frustration Waiting

Development : Bugs
Old Unrepresentative Data

Development : subsets
False Negatives
False Positives
Bugs in Production

QA : Long setup times
BugX
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7
Delay in Fixing the bug
Cost
To
Correct
Software Engineering Economics
– Barry Boehm (1981)

QA : destructive tests refresh time
32
20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST
8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs 8 Hrs

Development UATQA
99% of blocks are identical

• EMC Symmetrix
– 16 snapshots
– Write performance impact
– No snapshots of snapshots
• Netapp & EMC VNX
– 255 snapshots
• ZFS
– Compression
– Unlimited snapshots
– Snapshots of Snapshots
• DxFS
– Compression
– Unlimited snapshots
– Snapshots of Snapshots
– Shared cache in memory
Technology Core : file system snapshots
Also check out new SSD storage such as: Pure Storage, EMC XtremIO

Snapshot 1 – full backup once only at link time
Jonathan Lewis © 2013
Virtual DB
38 / 30
a b c d e f g h i
We start with a full backup - analogous to a level 0 rman backup. Includes
the archived redo log files needed for recovery. Run in archivelog mode.

Snapshot 2 (from SCN)
b' c'
a b c d e f g h i
The "backup from SCN" is analogous to a level 1
incremental backup (which includes the relevant
archived redo logs). Sensible to enable BCT.
Delphix executes
standard rman scripts

Apply Snapshot 2
a b c d e f g h ib' c'
The Delphix appliance unpacks the rman backup and "overwrites" the
initial backup with the changed blocks - but DxFS makes new copies of
the blocks

Drop Snapshot 1
b' c'a d e f g h i
The call to rman leaves us with a new level 0 backup, waiting for recovery.
But we can pick the snapshot root block. We have EVERY level 0 backup

Creating a vDB
b' c'a d e f g h i
The first step in creating a vDB is to take a snapshot of the filesystem as at
the backup you want (then roll it forward)
My vDB
(filesystem)
Your vDB
(filesystem)
b' c'a d e f g h i

Creating a vDB
Jonathan Lewis
© 2013
b' c'a d e f g h i
The first step in creating a vDB is to take a snapshot of the filesystem as at
the backup you want (then roll it forward)
My vDB
(filesystem)
Your vDB
(filesystem)
i’b' c'a d e f g h ib' c'a d e f g h i

Fuel not equal car
Challenges
1. Technical
2. Bureaucracy

Bureaucracy
Developer Asks for DB Get Access
Manager approves
DBA Request
system
Setup DB
System
Admin
Request
storage
Setup
machine
Storage
Admin
Allocate
storage
(take snapshot)

Why are hand offs so expensive?
1hour
1 day
9 days
Bureaucracy

Technical Challenge
Database
Luns
Production Filer
Target A
Target B
Target C
snapshot
clones
InstanceInstance
InstanceInstance
InstanceInstance
InstanceInstance
Instance
Source

Database
LUNs
snapshot
clonesProduction Filer
Development Filer
Technical Challenge
Instance
Target A
Target B
Target C
InstanceInstance
InstanceInstance
InstanceInstance
Instance

Technical Challenge
Copy
Time Flow
Purge
Production
File System Instance
TargetStorage
Clone (snapshot)
Compress
Share Cache
Provision
Mount, recover, rename
Self Service, Roles & Security
Instance
21 3

How to get a Data Virtualization?
Source
sync
Target
Deploy
Storage
snapshots
21 3
Source Sync Storage Snapshots Deploy automation
ZFS Yes (unlimited)
EMC SRDF Yes (16 or 255)
Netapp SMO Yes (255)
Oracle EM 12c Data Guard Netapp, ZFS Yes (oracle only, no
branching)
Actifio Yes Yes Yes (no branching)
Delphix Yes Yes yes

Actifio
Production
InstanceInstanceInstance
Actifio
InstanceInstance Instance
TargetActifio
Instance
Target

Oracle Snap Clone
ZFSSA
or
NetApp
Instance
TargetEM 12c
Instance
Target
Production

Oracle Snap Clone
Production
Data Guard
ZFSSA
or
NetApp
Instance
TargetEM 12c
Instance
Target

Oracle Snap Clone
Production
InstanceInstanceInstance EM 12c
Solaris
ZFS
Instance
TargetData Guard
Instance
Instance
Target
Any
storage

Incremental forever collect changes
Production
Time Flow
Changes
Instance
NFS
Target
Instance
Target

Three Physical Copies
Three Virtual Copies
Data
Virtualization
Appliance

Before Virtual Data
Production Dev, QA, UAT
Instance
Reporting Backup
File system
Database
Instance
File system
Database
File system
Database
File system
Database
Instance
Instance
Instance
File system
Database
File system
Database
“triple data
tax”

With Virtual Data
Production
Instance
Dev & QA
Instance
Reporting
Instance
Backup
Instance Instance Instance
InstanceInstance
Instance
File system
Database
Data
Virtualization
Appliance
Instance

• Problem in the Industry
• Solution
• Use Cases

1. Development and QA
2. Production Support
3. Business
Use Cases

1. Development & QA
3. Business
Use Cases

Development: Virtual Data
Development
* Fast * Free * Full size * Self service

Virtual Data: Easy
Instance
Instance
Instance
Instance
Source
DVA

Development Virtual Data: Parallelize
gif by Steve Karam

Development Virtual Data: Full size

Development Virtual Data: Self Service

QA : Virtual Data
• Fast
• Parallel
• Rollback
• A/B testing

Dev
QA
Instance
Prod
DVA
• Eliminate build time
• Find bugs Fast
• Run Parallel QA
QA Virtual Data : Parallel
Production Time Flow

QA Virtual Data : Fast Refresh
70
20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST
• Fast
• Full
• Fresh
• Efficient
8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs 8 Hrs
20 MIN
TEST

QA with Virtual Data: Rewind
DVAInstance
QA
Prod

QA with Virtual Data: A/B
DVAInstance
Instance
Instance
Index 1
Index 2

Data Version Control
1/30/2015 73
Dev
QA
2.1
Dev
QA
2.2
2.1 2.2
Instance
Prod
DVA Production Time Flow

• Backups
• Recovery
• Forensics
• Migration
• Consolidation
Production Support

9TB database 1TB change day
30 day backups storage requirements
76
0
10
20
30
40
50
60
70
week1
week2
week3
week4
original
Oracle
Delphix

Recovery
Instance
Instance
Recover VDB
Drop
Source
DVA Production Time Flow

Forensics
Instance
Development
DVA
Source

Development (the new production)
Instance
Development
DVA
Source
Development
Prod & VDB Time Flow

1. Development and QA
3. Business Intelligence
Use Cases

Business Intelligence
• ETL
• Temporal
• Confidence Testing
• Federated Databases
• Audits

Business Intelligence: ETL and Refresh Windows
1pm 10pm 8am
noon

Business Intelligence: batch taking too long
1pm 10pm 8am
noon
2011
2012
2013
2014
2015

2011
2012
2013
2014
2015
1pm 10pm 8am
noon
10pm 8am noon 9pm
6am 8am 10pm

Business Intelligence: ETL and DW Refreshes
Instance
Prod
Instance
DW & BI

• Collect only Changes
• Refresh in minutes
Instance
Prod
BI and DW
ETL
24x7
DVA
Virtual Data: Fast Refreshes

Modernization: Federated
Instance
Instance
Source1
Source2
DVA
Production Time Flow 1
Production Time Flow 2

“I looked like a hero”
Tony Young, CIO Informatica
Modernization: Federated

Audit
1/30/2015 93
Instance
Prod
DVA
Live Archive

1. Development & QA
3. Business
Use Case Summary

How expensive is the Data Constraint?
DVA at Fortune 500 :
Dev throughput increase by 2x

Faster
• Financial Close
• BI refreshes
• Surgical recovery
• Projects
How expensive is the Data Constraint?

• Projects “12 months to 6 months.”
– New York Life
• Insurance product “about 50 days ... to about 23 days”
– Presbyterian Health
• “Can't imagine working without it”
– State of California
Virtual Data Quotes

• Problem: Data is the constraint
• Solution: Virtualize Data
• Results:
• Half the time for projects
• Higher quality
• Increase revenue
Summary

Thank you!
• Kyle Hailey| Oracle ACE and Technical
Evangelist, Delphix
– Kyle@delphix.com
– kylehailey.com
– slideshare.net/khailey
– @datavirt

Data Virtualization: Revolutionizing data cloning

More Related Content

What's hot

Viewers also liked

Similar to Data Virtualization: Revolutionizing data cloning

More from Kyle Hailey

Recently uploaded

Data Virtualization: Revolutionizing data cloning