Agile Data Platform:
Revolutionizing Database Cloning
Kyle Hailey
http://kylehailey.com
Problem in IT

Get the right data
To the right people
At the right time
Sandbox

Development

QA

Production

UAT

Business
Intelligence
Forensics

Backup

Data Guard

Tape
Part I : Cloning Technology

Physical

Thin Clone

Virtual

Part II : Agile Data Acceleration
Database Cloning Challenge

If you can’t satisfy the business demands
then your process is broken.
Problem
Reports

Production

First
copy
QA and UAT

• CERN - European Organization for Nuclear
Research

• 145 TB database...
Tradeoff: Speed, Quality, Cost
What We’ve Seen
“if you can't measure it you can’t manage it”

1.
2.
3.
4.
5.

Inefficient QA: Higher costs of QA
QA Delay...
1. Inefficient QA: Long Build times

Build

QA Test
Build Time

96% of QA time was building environment
$.04/$1.00 actual ...
2. QA Delays: bugs found late require more code re-work
Build QA Env

Sprint 3

Sprint 2

Sprint 1

X

Build QA Env

QA

B...
3. Full Copy Shared : Bottlenecks

Old Unrepresentative Data

Frustration Waiting
4. Subsets : cause bugs
4. Subsets : cause
bugs
The Production ‘Wall’

Classic problem is that queries that
run fast on subsets hit the wall in
pr...
5. Slow Environment Builds: 3-6 Months to Deliver
Data

Developers

Management

Submit
Request
Approve
Request $$
(2 Weeks...
5. Slow Environment Builds: culture of no
DBA

Developer
Never enough environments
bottlenecks
What We’ve Seen

1.
2.
3.
4.
5.

Inefficient QA: Higher costs
QA Delays : Increased re-work
Sharing DB : Bottlenecks
Subse...
99% of blocks are identical

Clone 1

Clone 2

Clone 3
Thin Clone
Clone 1

Clone 2

Clone 3
2. Thin Cloning

I. Clonedb Oracle
II. EMC
• Copy on first write (COFW)
III. Netapp
• write anywhere file system (WAFL)
• ...
I.

clonedb

dNFS
sparse file

RMAN
backup
I.

clonedb

dNFS
sparse file

RMAN
backup
CloneDB

1. dNFS 11.2.0.2+
– cd $ORACLE_HOME/rdbms/lib
– make -f ins_rdbms.mk dnfs_on

2. Clonedb.pl initSOURCE.ora output...
Thin Cloning

I. Clonedb Oracle
II. EMC
• Copy on first write (COFW)
III. Netapp
• write anywhere file system (WAFL)
• & E...
II.

EMC Copy on Write
Active
File
System

Snapshot

A

B

C

D
II.

EMC Copy on Write
Active
File
System

Snapshot

A

B

C

D

Write penalty (read and two writes)
Limit 16 snapshots
No...
I. Clonedb Oracle
II. EMC
• Copy on first write (COFW)
III. Netapp
• write anywhere file system (WAFL)
• & EMC VNX redirec...
III. Netapp and EMC VNX
root

Data Blocks
• 255 snapshots
• Branching possible
I. Clonedb Oracle
II. EMC
• Copy on first write (COFW)
III. Netapp
• write anywhere file system (WAFL)
• & EMC VNX redirec...
IV. ZFS Allocate on Write
Snapshot root

Live
root

Zil
Intent Log

Unlimited Instantaneous Snapshots
Unlimited Instantane...
2. Thin Cloning
I.
II.
III.
IV.

Clonedb Oracle
EMC
Netapp
ZFS

1.
2.
3.
4.
5.

Put database in hot backup
Take Snapshot
C...
Problem: How do you get data off
Production?

Target A
Instance
Instance

Development Filer

Production Filer

clones
snap...
Three Core Parts
Production
Instance

1
1.
2.

Copy
Sync

Storage

File System

2
Clone
(snapshot)

Development

Instance
...
Three Core Parts
Production
Instance

1
1. Copy
2. Sync

Storage

File System

Development

Instance
Netapp 1
SnapManager
Repository

DBA

Snap
Manager
Protection
Manager

tr-3761.pdf
RMAN
Repository

Snap
Manager

Flex Clo...
Netapp 1
Target A
NetApp Filer - Production

Production

NetApp Filer - Development

Snap
Protection
mirror Manage

Snap
D...
Three Core Parts3
Production
Instance

Storage

File System

Development

Instance

3
Mount, recover, rename
Roles & secur...
3

Oracle EM 12c Snap Clone

EM 12c
Test
Master

Source
Instance

?

instance

Profile
•
•
•
•
•
•
•
•

Register Netapp or...
Where we Are

Production
Instance
Instance

Database

File system
File system

QA

UAT

Instance
Instance

Instance
Instan...
Want be here
Production
Instance
Instance

Database

Development

QA

UAT

Instance
Instance

Instance
Instance

Instance
...
EM 12c: Snap Clone

Production
Flexclone

Development
Flexclone
Netapp
Snap Manager for Oracle
Thin Cloning
3. Database Virtualization
Three Physical Copies

Three Virtual Copies

Data
Virtualization
Appliance
SMU

ZFS Storage Appliance

Choose your virtualization Layer:
• Delphix and Oracle SMU
Oracle 12c SMU
Oracle Snap Management Utility for ZFS Appliance

• Requires ZFS Appliance
• Supports Linux , Solaris 10+, ...
Install Delphix on x86 hardware

Intel hardware
Allocate Any Storage to Delphix

Allocate Storage
Any type
One time backup of source database
Production
Supports
Instance

Database

File system

Upcoming
DxFS (Delphix) Compress Data
Production
Instance

Database

File system

Data is
compressed
typically 1/3
size
Incremental forever change collection
Production
Instance

Database

Changes
Time Window

File system

• Collected increme...
Typical Architecture

Production
Instance
Instance

Database

File system
File system

QA

UAT

Instance
Instance

Instanc...
With Delphix
Production
Instance
Instance

Database

File system

Development

QA

UAT

Instance
Instance

Instance
Instan...
Three Core Parts

1

2

Production

3
Development

Instance

Instance

Time Window

Source Syncing

Storage (DxFS)Self Ser...
Fast, Fresh, Full
Source

Development VDB

Instance

Instance

Time Window
Free
Instance

Source
Instance

Source
Time Window
Instance

Instance

gif by Steve Karam
Self Service
Branching
Dev VDB

Source

Instance

Instance

Source
Time Window
Dev1 VDB
Time Window
End of Sprint
Or a Code Freeze

QA ...
Federated Cloning
Federated
Instance

Source1
Instance

Source2

Source1
Time Window

Instance

Source2
Time Window

Instance
“I looked like a hero”
Tony Young, CIO Informatica
DevOps
DevOps With Delphix

1.
2.
3.
4.
5.

Efficient QA: Low cost, high utilization
Quick QA : Fast Bug Fix
Every Dev gets DB: P...
1. Efficient QA: Lower cost

Build

QA Test
Build Time
B
u
i
l
d

T
i
m
e

QA Test

1% of QA time was building environment...
Rapid QA via Branching
2. QA Immediate: bugs found fast and fixed
Build QA Env

Sprint 2

Sprint 1

X

Q
A

Build QA Env

Q
A

Sprint 3

Bug Code...
3. Private Copies: Parallelize
4. Full Size DB : Eliminate bugs
5. Self Service: Fast, Efficient. Culture of Yes!

Developers

Management

Submit
Request
Approve
Request $$
(2 Weeks)

Ap...
Quality

• Forensics
• A/B testing
• Recovery
Investigate Production Bugs
Development
Instance

Instance

Time Window

Anomaly on Prod
Possible code bug
At noon yesterd...
Rewind for patch and QA testing
Prod

Development

Instance

Instance

Time Window

Time Window
A/B testing

Instance

Test A with Index 1

Instance

Instance

Time Window

• Keep tests for compare
• Production vs Virt...
Surgical recover of Production
Source

Development

Instance

Instance

Spin VDB up
Before drop
Time Window
Problem on Pro...
Surgical or Full Recovery on VDB VDB
Dev1
Source

Instance

Instance

Dev2 VDB Branched
Source
Time Window
Dev1 VDB
Time W...
Virtual to Physical
Source

VDB

Instance

Instance

Spin VDB up
Before drop
Time Window

Corruption
Recovery
Business Intelligence
ETL and Refresh Windows

1pm
noon

10pm

8am
ETL and DW refreshes taking longer

1pm
noon

10pm

2011
2012
2013
2014
2015

8am
Database going Global
Globalization Reduces Windows

10pm

1pm
noon

8am

10pm

2011
2012
2013
2014
2015

noon

9pm

8am
6am

8am

10pm
ETL and Refresh
Windows

6am

8am

10pm

10pm

1pm
noon

8am

10pm

2011
2012
2013
2014
2015

noon

9pm

8am
ETL and DW Refreshes
Prod

DW & BI

Instance

Instance

Data Guard – requires full refresh if used
Active Data Guard – rea...
Fast Refreshes
• Collect only Changes
• Refresh in minutes

Prod
Instance

BI

DW

Instance

Instance

ETL
24x7
Temporal Data
BI
a)

Fast refreshes

a)

Temporal queries

b)

Confidence testing
Review: Use Cases
1.

Development Acceleration
a)
b)
c)

2.

Quality
a)
b)
c)

3.

Full, Fresh, Fast , Self Serve
QA Branc...
perhaps the single largest
storage
consolidation opportunity
history“

over 10 times
Oracle 12c
80MB buffer cache ?
200GB
Cache
with

Latency

Tnxs / min

5000

300
ms

1

5

10 20 30 60 100 200

Users

1

5

10 20 30 60 100 200
Latency

Tnxs / min

8000

600
ms

1

5

10 20 30 60 100 200

Users

1

5

10 20 30 60 100 200
$1,000,000

$6,000
Database Virtualization
About Delphix
•
•
•
•

Founded in 2008, launched in 2010
CEO Jedidiah Yueh (founder of Avamar: >$1B revenue))
Based in Sil...
Business

Develop

IT

Storage

$27,000M
$850M
$75M

$40M
Good, Cheap, Fast : choose two

Good

Cheap

Fast
FS vs. ZFS

•
•
•

FS per Volume
FS limited bandwidth
Storage stranded

FS

FS

FS

Volume

Volume

Volume

•
•
•

Many FS...
Three Core Parts
Production
Instance

1
Copy
Sync
Snapshots
Purge
Time Flow

Storage

File System

2
Clone
(snapshot)
Comp...
OOUG: Database Virtualization
OOUG: Database Virtualization
OOUG: Database Virtualization
OOUG: Database Virtualization
OOUG: Database Virtualization
OOUG: Database Virtualization
OOUG: Database Virtualization
OOUG: Database Virtualization
OOUG: Database Virtualization
OOUG: Database Virtualization
OOUG: Database Virtualization
OOUG: Database Virtualization
OOUG: Database Virtualization
OOUG: Database Virtualization
Upcoming SlideShare
Loading in …5
×

OOUG: Database Virtualization

1,408 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,408
On SlideShare
0
From Embeds
0
Number of Embeds
719
Actions
Shares
0
Downloads
31
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Kyle HaileyWork for a company called DelphixWe write software that enables Oracle and SQL Server customers toCopy their databases in 2 minutes with almost no storage overheadWe accomplish that by taking one initial copy and sharing the duplicate blocks Across all the clones
  • What are these technologiesbenefits and drawbacksTechnology is awesomeComing of ageClonedb 3 pres @ OOW SMU OEM 12c DBaaS 12c “clone” pluggable databases”
  • want data now.don’t understand DBAs.Db bigger and harder to copy.Devswant more copies.Reporting wants more copies.Everyone has storage constraints.If you can’t satisfy the business demands your process is broken
  • Prod critical for businessPerformance of prod is top priorityProtect prod from load
  • You might be familiar with this cycle that we’ve seen in the industry:Where IT departments budgets are being constrainedWhen IT budgets are constrained one of the first targets is reducing storageAs storage budgets are reduced the ability to provision database copies and development environments goes downAs development environments become constrained, projects start to hit delays. As projects are delayed The applications that the business depend on to generate revenue to pay for IT budgets are delayedWhich reduces revenue as the business cannot access new applications Which in turn puts more pressure on the IT budget.It becomes a viscous circle
  • I don’t knowIf these situations ring a bell at your organization orif you can imagine some of these situations But here are some of the issues we at Delphix are seeing in the industry with the companies we are talking to.Let’s look at the 5 points in more detail
  • We talked to Presbyterian HealthcareAnd they told us that they spend 96% of their QA cycle time building the QA environmentAnd only 4% actually running the QA suiteThis happens for every QA suitemeaningFor every dollar spent on QA there was only 4 cents of actual QA value Meaning 96% cost is spent infrastructure time and overhead
  • Because of the time required to set up QA environmentsThe actual QA tests suites lag behind the end of a sprint or code freezeMeaning that the amount of time that goes by after the introduction of a bug in code and before the bug is found increasesAnd the more time that goes by after the introduction of a bug into the codeThe more dependent is written on top of the bug Increasing the amount of code rework required after the bug is finally foundIn his seminal book that some of you may be familiar with, “Software Engineering Economics”, author Barry Boehm Introduce the computer world to the idea that the longer one delays fixing a bug in the application design lifescyleThe more expensive it is to to fix that bug and these cost rise exponentially the laterThe bug is address in the cycle
  • Not sure if you’ve run into this but I have personally experience the followingWhen I was talking to one group at Ebay, in that development group they Shared a single copy of the production database between the developers on that team.What this sharing of a single copy of production meant, is that whenever a Developer wanted to modified that database, they had to submit their changes to codeReview and that code review took 1 to 2 weeks.I don’t know about you, but that kind of delay would stifle my motivationAnd I have direct experience with the kind of disgruntlement it can cause.When I was last a DBA, all schema changes went through me.It took me about half a day to process schema changes. That delay was too much so it was unilaterally decided byThey developers to go to an EAV schema. Or entity attribute value schemaWhich mean that developers could add new fields without consulting me and without stepping on each others feat.It also mean that SQL code as unreadable and performance was atrocious.Besides creating developer frustration, sharing a database also makes refreshing the data difficult as it takes a while to refresh the full copyAnd it takes even longer to coordinate a time when everyone stops using the copy to make the refreshAll this means is that the copy rarely gets refreshed and the data gets old and unreliable
  • MisguidedattributingRelax the constraints http://martinfowler.com/bliki/NoDBA.html
  • To circumvent the problems of sharing a single copy of productionMany shops we talk to create subsets.One company we talked to , RBS spends 50% of time copying databases have to subset because not enough storagesubsetting process constantly needs fixing modificationNow What happens when developers use subsets -- ****** -----
  • Stubhub (ebay) estimates that 20% of there production bugs arise from testing onSubsets instead of full database copies.
  • The biggest and most pervasive problem we see is slow build times.In order to set up an database copy for a development environmentsRequires submitting a request to management who has to review itThen if the request is granted, it is passed to the DBA who has to coordinate with the Sysadmin who has to coordinate with the storage admin.In such a situation it makes sense that copying a large database would take a long timeBut even when we talk to someone who uses netapp storage snapshots like Electronics Art, they said even using storage snapshot sit took2-4 days to get a database clone copy due to the coordination between DBA, sys admin and storage adminAt many of the customers we talk to provisioning a database clone copy takes weeks or months.One large global bank quotes us as taking typically 6 months to provision a database clone copy environment.Requirements: self service for app teamsRequirements: end-to-end automationMetrics: # people, process, time for deliverySo far we have talked about the weight of infrastructure on app delivery. Of course, to control and manage that infrastructure, firms layer on a large set of bureaucratic processes, change control, approvals, procurement, governance, etc etc. So the operational and organizational hurdles then create an even bigger drag on IT and app development.Here’s an example from one banking customer.Once the app developer puts in a request for a new development environment, there’s at least a week long wait for management approvals. Then project DBA work with the sysadmin and storage groups for capacity. If more capacity needs to be allocated, it’s 3 more days. If more needs to be purchased, weeks or months. If a copy of production data is needed, the process needs to wait on a production DBA, who might be busy with production issues. Recovering the database to a specific point in time and configuration can also take days.It is very common for two weeks to pass between a developer request and a ready environment. The process can be repeated for multiple environments, for data refreshes, and for integration across multiple systems.With Delphix, turns stop signs into green lights. Provisioning, refresh, rollback, and data integration happen nearly instantly and do not trigger approvals from production systems or require additional storage. That is why KLA is able to deliver 5 times the output from its SAP teams…Without Delphix, it’s impossible for organizations to implement the level of agile processes they desire. The management of data, and the bureaucracy of data management, slows things down too much.
  • Due to the constraints of building clone copy database environments one ends up in the “culture of no”Where developers stop asking for a copy of a production database because the answer is “no”If the developers need to debug an anomaly seen on production or if they need to write a custom module which requires a copy of production they know not to even ask and just give up.
  • State of Colorado has a 100 projects support 3KLA tencor can only support 2 projects of a dozen
  • Slow downs mean bottlenecksThese bottlenecks cause failures in IT projectsI’m into eliminating bottlenecks (whether it is wait events, tuning sql or provisioning copies of dbs)
  • Fastest query is the query not run
  • Performance issuesSingle point in time
  • Performance issuesSingle point in time
  • Physical File System:Performance issues Multiple points in timeOccasional rebuild
  • Physical File System:Performance issues Multiple points in timeOccasional rebuild
  • Source Syncing* Initial backup once onlyContinual forever change collection Purging of old data Storage DxFSShare blocks snap shots , unlimited, storage agnosticCompression , 1/3 typically, compress on block boundaries. Overhead for compression is basically undetectable Share data in memory, super caching*Self Service AutomationVirtual database provisioning, rollback, refresh*, branching*, tagging*Mount files over NFSInit.ora, SID, database name, database unique nameSecurity on who can see which source databases, how many clones they can make and how much storage they can use
  • Source Syncing* Initial backup once onlyContinual forever change collection Purging of old data Storage DxFSShare blocks snap shots , unlimited, storage agnosticCompression , 1/3 typically, compress on block boundaries. Overhead for compression is basically undetectable Share data in memory, super caching*Self Service AutomationVirtual database provisioning, rollback, refresh*, branching*, tagging*Mount files over NFSInit.ora, SID, database name, database unique nameSecurity on who can see which source databases, how many clones they can make and how much storage they can use
  • Source Syncing* Initial backup once onlyContinual forever change collection Purging of old data Storage DxFSShare blocks snap shots , unlimited, storage agnosticCompression , 1/3 typically, compress on block boundaries. Overhead for compression is basically undetectable Share data in memory, super caching*Self Service AutomationVirtual database provisioning, rollback, refresh*, branching*, tagging*Mount files over NFSInit.ora, SID, database name, database unique nameSecurity on who can see which source databases, how many clones they can make and how much storage they can use
  • Navigate to “ setup -> provisining patching -> storage registration”Click “Register” tab, and choose storage, either Netapp or ZFS,Supply storage informationName: Storage array name registered in DNSVendorProtocol: http or httpsStorage Credentials: credentials for interacting with storageInstall agents on a separate LINUX machine to manage the Netapp or ZFS storage. An agent has to run on Linux host to manage the  storage. Supply theAgent hostHost credentialsPick a database to make the test masterPut the test master on ZFS storage or Netapp storageRegister the ZFS storage or Netapp storage with OEMEnable Snap Clone for the  test master databaseSet up a zone – set max CPU and Memory for a set of hosts and the roles that can see these zonesSet up a pool – a pool is a set of machines where databases can be provisionedSet up a profile – a source database that can be used for thin cloningSet up a service template – reference values such as a init.ora for database to be created
  • Requires expert storage admins specialized equipment scripting2-7 Days or 2-8 hours if everyone togetherCERN recently gave a presentation where they wrote almost 30,000 lines of code13k lines & 15k lines of PHP
  • Technology has existed 15+ years Why hasn’t there been more adoption ??
  • Like the internetInternet existed before browserftp, bulliten boards, chat rooms, gopher, telnet etcDidn’t take off until the browserThin cloning didn’t take off until database virtualizaiton
  • Like the internet
  • In the physical database world, 3 clones take up 3x the storage.In the virtual world 3 clones take up 1/3 the storage thanks to block sharing and compression
  • Saw it at OOW nothing till couple weeks agoCustomer had beta but wouldn’t allow us to take screen shots
  • Software installs an any x86 hardware uses any storage supports Oracle 9.2-12c, standard edition, enterprise edition, single instance and RAC on AIX, Sparc, HPUX, LINUX support SQL Server
  • EMC, Netapp, Fujitsu, Or newer flash storage likeViolin, Pure Storage, Fusion IO etc
  • Delphix does a one time only copy of the source database onto Delphix
  • Quote from a customer “Delphix GUI is what Oracle Enterprise Manager would look like if Apple had designed it”Delphix inter face is user friendly, polished and easy to use
  • Source Syncing* Initial backup once onlyContinual forever change collection Purging of old data Storage DxFSShare blocks snap shots , unlimited, storage agnosticCompression , 1/3 typically, compress on block boundaries. Overhead for compression is basically undetectable Share data in memory, super caching*Self Service AutomationVirtual database provisioning, rollback, refresh*, branching*, tagging*Mount files over NFSInit.ora, SID, database name, database unique nameSecurity on who can see which source databases, how many clones they can make and how much storage they can use
  • Physically independent but logically correlatedCloning multiple source databases at the same time can be a daunting task
  • One example with our customers is InformaticaWho had a project to integrate 6 databases into one central databaseThe time of the project was estimated at 12 monthsWith much of that coming from trying to orchestratingGetting copies of the 6 databases at the same point in timeLike herding cats
  • Informatical had a 12 month project to integrate 6 databases.After installing Delphix they did it in 6 months.I delivered this earlyI generated more revenueI freed up money and put it into innovationwon an award with Ventana Research for this project
  • Presbyterian when from 10 hour builds to 10 minute buildsTotal Investment in Test Environment: $2M/year10 QA engineersDBA, storage team dedicated to support testingApp, Oracle server, storage, backupsRestore load competes with backup jobsRequirements: fast data refresh, rollbackData delivery takes 480 out of 500 minute test cycle (4% value)$.04/$1.00 actual testing vs. setup
  • For example Stubhub went from 5 copies of production in development to 120Giving each developer their own copy
  • Stubhub estimated a 20% reduction in bugs that made it to production
  • Multiple scripted dumps or RMAN backups are used to move data today. With application awareness, we only request change blocks—dramatically reducing production loads by as much as 80%. We also eliminate the need for DBAs to manage custom scripts, which are expensive to maintain and support over time.
  • Developer each get a copyFast, fresh, full, frequentSelf serviceQA branch from DevelopmentFederated cloning easyForensicsA/B testingRecovery : Logical and physical Development Provision and RefreshFullFreshFrequent (Many) Source control for code, data control for the database Data version per release version Federated cloning QA fork copies off to QA QA fork copies back to Dev Instant replay – set up and run destructive tests performance A/B Upgrade patching Recovery Backup 50 days in size of 1 copy, continuous data protection (use recent slide ob backup schedules full, incr,inrc,inrc, full) Restore logical recovery on prod logical recovery on Dev Debugging debug on clone instead of prod debug on data at the time of a problem Validate physical integrity (test for physical corruption)
  • Change mentality from few as possible to as many as accelerates the businessRemember Jinga ?
  • Talking of eliminating 3 PB at one customer and potentially expanding that to 50 PB in next 1-2 yearsWith 380,000 customers with average of 10 database copies If every MB was an Inch 300,000 customers 12 copies on average 100 GB avg size PB TB GB 300000*12*100 = 360,000,000 300000*1*.3*100 = 9,000,000 351 PB e p t g 1,191,290,000 feet to moon, 132,000,000 feet around the earthe p t g m k b 15,133,979,520 inches to the moone p t g m k b 351,000,000,00015,133,979,520 inches to the moone p t g m k b 35100000000015133979520 inches to the moon
  •  HD  720TB down to   8TB     ( create 19 x 36TB VDBs )CISCO eliminate 50 PB
  • Macys 4000 hours/year cloning to 8 hours/yearKLA-Tencor over doubled project output, like taking 100 person team and making it a 200 person teamInformatica – finished 2x fasterStubhub -   2 x as many releases a year + 20% less bugsQuality Comcast Analytics Holland America – eliminated refreshes saturating infrastructureWindriver - “ “
  • Moral of this storyInstead of dragging behind enormous amounts of infrastructureand bureaucracy  required to provide database copiesUses db virteliminates the drag and provides power and acceleration To your companyDefining moment CompetitorsServices
  • Once Last Thinghttp://www.dadbm.com/wp-content/uploads/2013/01/12c_pluggable_database_vs_separate_database.png
  • 250 pdb x 200 GB = 50 TBEMC sells 1GB$1000Dell sells 32GB $1,000.terabyte of RAM on a Dell costs around $32,000terabyte of RAM on a VMAX 40k costs around $1,000,000.
  • http://www.emc.com/collateral/emcwsca/master-price-list.pdf    These prices obtain on pages 897/898:Storage engine for VMAX 40k with 256 GB RAM is around $393,000Storage engine for VMAX 40k with  48 GB RAM is around $200,000So, the cost of RAM here is 193,000 / 208 = $927 a gigabyte.   That seems like a good deal for EMC, as Dell sells 32 GB RAM DIMMs for just over $1,000.    So, a terabyte of RAM on a Dell costs around $32,000, and a terabyte of RAM on a VMAX 40k costs around $1,000,000.2) Most DBs have a buffer cache that is less than 0.5% (not 5%, 0.5%) of the datafile size.
  • reduces storagealleviates DBA of repetitive focus on innovationAccelerates DevelopmentEliminate bottleneck more code faster and of better quality
  • Founded in 2008, launched in 2010JedidiahYueh, President and CEOFounded Avamar in 1999, sold to EMC in 2006, VP Product Mgmt at EMCAvamar: >$1B revenue, 150 Employees: HQ in Menlo Park, SF, Boston, DC, London, NY and AtlantaGrowing 250% annually – 130+ customers including 100 Fortune1000 Customers
  • Tightening constraining resourcesCascading affect on companies.The business doesn’t know or understand this DBA workDBAs are often the hardest resource for IT to justify because they are invisibleDBAs are already being asked to do a tremendous amountDBAs are often on call 24x7DBAs are foundational.
  • http://mobile.stufffundieslike.com/2010/02/upping-the-ante-on-doctrinal-disagreements/
  • There is saying in the industry that we want “good, cheap, fast: choose two”Meaning we want to build applications quickly, ie fast, we want those applications to have good functionality and we want those applications to be \cheap to buildBut we can’t have all three.
  • Source Syncing* Initial backup once onlyContinual forever change collection Purging of old data Storage DxFSShare blocks snap shots , unlimited, storage agnosticCompression , 1/3 typically, compress on block boundaries. Overhead for compression is basically undetectable Share data in memory, super caching*Self Service AutomationVirtual database provisioning, rollback, refresh*, branching*, tagging*Mount files over NFSInit.ora, SID, database name, database unique nameSecurity on who can see which source databases, how many clones they can make and how much storage they can use
  • OOUG: Database Virtualization

    1. 1. Agile Data Platform: Revolutionizing Database Cloning Kyle Hailey http://kylehailey.com
    2. 2. Problem in IT Get the right data To the right people At the right time
    3. 3. Sandbox Development QA Production UAT Business Intelligence Forensics Backup Data Guard Tape
    4. 4. Part I : Cloning Technology Physical Thin Clone Virtual Part II : Agile Data Acceleration
    5. 5. Database Cloning Challenge If you can’t satisfy the business demands then your process is broken.
    6. 6. Problem Reports Production First copy QA and UAT • CERN - European Organization for Nuclear Research • 145 TB database • 75 TB growth each year • Dozens of developers want copies. Developers
    7. 7. Tradeoff: Speed, Quality, Cost
    8. 8. What We’ve Seen “if you can't measure it you can’t manage it” 1. 2. 3. 4. 5. Inefficient QA: Higher costs of QA QA Delays : Greater re-work of code Sharing DB Environments : Bottlenecks Using DB Subsets: More bugs in Prod Slow Environment Builds: Delays
    9. 9. 1. Inefficient QA: Long Build times Build QA Test Build Time 96% of QA time was building environment $.04/$1.00 actual testing vs. setup
    10. 10. 2. QA Delays: bugs found late require more code re-work Build QA Env Sprint 3 Sprint 2 Sprint 1 X Build QA Env QA Bug Code 70 60 50 40 30 20 10 0 Cost To Correct 1 2 3 4 5 6 7 Delay in Fixing the bug Software Engineering Economics – Barry Boehm (1981) QA
    11. 11. 3. Full Copy Shared : Bottlenecks Old Unrepresentative Data Frustration Waiting
    12. 12. 4. Subsets : cause bugs
    13. 13. 4. Subsets : cause bugs The Production ‘Wall’ Classic problem is that queries that run fast on subsets hit the wall in production. Developers are unable to test against all data
    14. 14. 5. Slow Environment Builds: 3-6 Months to Deliver Data Developers Management Submit Request Approve Request $$ (2 Weeks) Approve Request $$ (2 Weeks) Approve Request $$ (1 Week) (2 Days) DBA System Admin (3 Days) (3 Days) Disk Capacity? Storage Admin (3 Days) Begin Work …….1-2 Weeks of Approvals, Delays, and Provisioning…… Request Additional Storage? File System Configured? Coordinate Replication w/ Infrastructure Configure LUNS & Build File System ReParameterize & Configure DB Mount Recovery DB to Specific PIT (3 Days) Provision Capacity 15
    15. 15. 5. Slow Environment Builds: culture of no DBA Developer
    16. 16. Never enough environments
    17. 17. bottlenecks
    18. 18. What We’ve Seen 1. 2. 3. 4. 5. Inefficient QA: Higher costs QA Delays : Increased re-work Sharing DB : Bottlenecks Subset DB : Bugs Slow Environment Builds: Delays
    19. 19. 99% of blocks are identical Clone 1 Clone 2 Clone 3
    20. 20. Thin Clone Clone 1 Clone 2 Clone 3
    21. 21. 2. Thin Cloning I. Clonedb Oracle II. EMC • Copy on first write (COFW) III. Netapp • write anywhere file system (WAFL) • & EMC VNX redirect on write (ROW) IV. ZFS
    22. 22. I. clonedb dNFS sparse file RMAN backup
    23. 23. I. clonedb dNFS sparse file RMAN backup
    24. 24. CloneDB 1. dNFS 11.2.0.2+ – cd $ORACLE_HOME/rdbms/lib – make -f ins_rdbms.mk dnfs_on 2. Clonedb.pl initSOURCE.ora output.sql – MASTER_COPY_DIR="/rman_backup” – CLONE_FILE_CREATE_DEST="/nfs_mount” – CLONEDB_NAME="clone" 3. sqlplus / as sysdba @output.sql – startup nomount PFILE=initclone.ora – Create control file backup location – dbms_dnfs.clonedb_renamefile ('/backup/file.dbf' , '/clone/file.dbf'); – alter database open resetlogs; Hall Tim www.oracle-base.com/articles/11g/clonedb-11gr2.php
    25. 25. Thin Cloning I. Clonedb Oracle II. EMC • Copy on first write (COFW) III. Netapp • write anywhere file system (WAFL) • & EMC VNX redirect on write (ROW) IV. ZFS
    26. 26. II. EMC Copy on Write Active File System Snapshot A B C D
    27. 27. II. EMC Copy on Write Active File System Snapshot A B C D Write penalty (read and two writes) Limit 16 snapshots No Branching (snapshots of snapshots) D
    28. 28. I. Clonedb Oracle II. EMC • Copy on first write (COFW) III. Netapp • write anywhere file system (WAFL) • & EMC VNX redirect on write (ROW) IV. ZFS
    29. 29. III. Netapp and EMC VNX root Data Blocks • 255 snapshots • Branching possible
    30. 30. I. Clonedb Oracle II. EMC • Copy on first write (COFW) III. Netapp • write anywhere file system (WAFL) • & EMC VNX redirect on write (ROW) IV. ZFS
    31. 31. IV. ZFS Allocate on Write Snapshot root Live root Zil Intent Log Unlimited Instantaneous Snapshots Unlimited Instantaneous Clones Branching easy and unlimited
    32. 32. 2. Thin Cloning I. II. III. IV. Clonedb Oracle EMC Netapp ZFS 1. 2. 3. 4. 5. Put database in hot backup Take Snapshot Clone Snapshot (ZFS & Netapp) Export Clone Mount on target host Target A Production Filer clones Source snapshot Instance Instance Target B Instance Instance Database Luns Instance Instance Target C Instance Instance Instance
    33. 33. Problem: How do you get data off Production? Target A Instance Instance Development Filer Production Filer clones snapshot Instance Target B Database LUNs Instance Instance Target C Instance Instance Instance
    34. 34. Three Core Parts Production Instance 1 1. 2. Copy Sync Storage File System 2 Clone (snapshot) Development Instance 3 Mount, recover, rename
    35. 35. Three Core Parts Production Instance 1 1. Copy 2. Sync Storage File System Development Instance
    36. 36. Netapp 1 SnapManager Repository DBA Snap Manager Protection Manager tr-3761.pdf RMAN Repository Snap Manager Flex Clone Production Snap Drive Snap Mirror Storage Admin Development
    37. 37. Netapp 1 Target A NetApp Filer - Production Production NetApp Filer - Development Snap Protection mirror Manage Snap Drive Flexclone Instance Instance Target B Database Luns Instance Instance Snapshot Manager for Oracle Repository Database Development Target C Instance Instance Instance
    38. 38. Three Core Parts3 Production Instance Storage File System Development Instance 3 Mount, recover, rename Roles & security
    39. 39. 3 Oracle EM 12c Snap Clone EM 12c Test Master Source Instance ? instance Profile • • • • • • • • Register Netapp or ZFS with Storage Credentials Install agents on a LINUX machine to manage the Netapp or ZFS storage. Register test master database Enable Snap Clone for the test master database Set up a zone – set max CPU and Memory and the roles that can see these zones Set up a pool – a pool is a set of machines where databases can be provisioned Set up a profile – a source database that can be used for thin cloning Set up a service template – init.ora values Linux Clone instance Agents Pool Template Zone ZFS or NetApp
    40. 40. Where we Are Production Instance Instance Database File system File system QA UAT Instance Instance Instance Instance Instance Instance Database Database Database File system File system File system File system File system File system File system Development
    41. 41. Want be here Production Instance Instance Database Development QA UAT Instance Instance Instance Instance Instance Instance Database Database File system Snapshots Database
    42. 42. EM 12c: Snap Clone Production Flexclone Development Flexclone Netapp Snap Manager for Oracle
    43. 43. Thin Cloning
    44. 44. 3. Database Virtualization
    45. 45. Three Physical Copies Three Virtual Copies Data Virtualization Appliance
    46. 46. SMU ZFS Storage Appliance Choose your virtualization Layer: • Delphix and Oracle SMU
    47. 47. Oracle 12c SMU Oracle Snap Management Utility for ZFS Appliance • Requires ZFS Appliance • Supports Linux , Solaris 10+, Windows 2008+ • GUI – snapshot source databases – provision virtual databases
    48. 48. Install Delphix on x86 hardware Intel hardware
    49. 49. Allocate Any Storage to Delphix Allocate Storage Any type
    50. 50. One time backup of source database Production Supports Instance Database File system Upcoming
    51. 51. DxFS (Delphix) Compress Data Production Instance Database File system Data is compressed typically 1/3 size
    52. 52. Incremental forever change collection Production Instance Database Changes Time Window File system • Collected incrementally forever • Old data purged
    53. 53. Typical Architecture Production Instance Instance Database File system File system QA UAT Instance Instance Instance Instance Instance Instance Database Database Database File system File system File system File system File system File system File system Development
    54. 54. With Delphix Production Instance Instance Database File system Development QA UAT Instance Instance Instance Instance Instance Instance Database Database Database
    55. 55. Three Core Parts 1 2 Production 3 Development Instance Instance Time Window Source Syncing Storage (DxFS)Self Service
    56. 56. Fast, Fresh, Full Source Development VDB Instance Instance Time Window
    57. 57. Free Instance Source Instance Source Time Window Instance Instance gif by Steve Karam
    58. 58. Self Service
    59. 59. Branching Dev VDB Source Instance Instance Source Time Window Dev1 VDB Time Window End of Sprint Or a Code Freeze QA VDB (branched from Dev) Instance
    60. 60. Federated Cloning
    61. 61. Federated Instance Source1 Instance Source2 Source1 Time Window Instance Source2 Time Window Instance
    62. 62. “I looked like a hero” Tony Young, CIO Informatica
    63. 63. DevOps
    64. 64. DevOps With Delphix 1. 2. 3. 4. 5. Efficient QA: Low cost, high utilization Quick QA : Fast Bug Fix Every Dev gets DB: Parallelized Dev Full DB : Less Bugs Fast Builds: Fast Dev, Culture of Yes
    65. 65. 1. Efficient QA: Lower cost Build QA Test Build Time B u i l d T i m e QA Test 1% of QA time was building environment $.99/$1.00 actual testing vs. setup
    66. 66. Rapid QA via Branching
    67. 67. 2. QA Immediate: bugs found fast and fixed Build QA Env Sprint 2 Sprint 1 X Q A Build QA Env Q A Sprint 3 Bug Code QA QA Sprint 2 Sprint 1 X Bug Code Sprint 3
    68. 68. 3. Private Copies: Parallelize
    69. 69. 4. Full Size DB : Eliminate bugs
    70. 70. 5. Self Service: Fast, Efficient. Culture of Yes! Developers Management Submit Request Approve Request $$ (2 Weeks) Approve Request $$ (2 Weeks) Approve Request $$ (1 Week) (2 Days) DBA System Admin (3 Days) (3 Days) Disk Capacity? Storage Admin (3 Days) Begin Work …….1-2 Weeks of Approvals, Delays, and Provisioning…… Request Additional Storage? File System Configured? Provision Capacity Coordinate Replication w/ Infrastructure Configure LUNS & Build File System ReParameterize & Configure DB (3 Days) Mount Recovery DB to Specific PIT
    71. 71. Quality • Forensics • A/B testing • Recovery
    72. 72. Investigate Production Bugs Development Instance Instance Time Window Anomaly on Prod Possible code bug At noon yesterday Spin up VDB of Prod as it was during anomaly
    73. 73. Rewind for patch and QA testing Prod Development Instance Instance Time Window Time Window
    74. 74. A/B testing Instance Test A with Index 1 Instance Instance Time Window • Keep tests for compare • Production vs Virtual – invisible index on Prod – Creating index on virtual • Flashback vs Virtual Test B with Index 2
    75. 75. Surgical recover of Production Source Development Instance Instance Spin VDB up Before drop Time Window Problem on Prod Dropped Table Accidently
    76. 76. Surgical or Full Recovery on VDB VDB Dev1 Source Instance Instance Dev2 VDB Branched Source Time Window Dev1 VDB Time Window Instance
    77. 77. Virtual to Physical Source VDB Instance Instance Spin VDB up Before drop Time Window Corruption
    78. 78. Recovery
    79. 79. Business Intelligence
    80. 80. ETL and Refresh Windows 1pm noon 10pm 8am
    81. 81. ETL and DW refreshes taking longer 1pm noon 10pm 2011 2012 2013 2014 2015 8am
    82. 82. Database going Global
    83. 83. Globalization Reduces Windows 10pm 1pm noon 8am 10pm 2011 2012 2013 2014 2015 noon 9pm 8am
    84. 84. 6am 8am 10pm
    85. 85. ETL and Refresh Windows 6am 8am 10pm 10pm 1pm noon 8am 10pm 2011 2012 2013 2014 2015 noon 9pm 8am
    86. 86. ETL and DW Refreshes Prod DW & BI Instance Instance Data Guard – requires full refresh if used Active Data Guard – read only, most reports don’t work
    87. 87. Fast Refreshes • Collect only Changes • Refresh in minutes Prod Instance BI DW Instance Instance ETL 24x7
    88. 88. Temporal Data
    89. 89. BI a) Fast refreshes a) Temporal queries b) Confidence testing
    90. 90. Review: Use Cases 1. Development Acceleration a) b) c) 2. Quality a) b) c) 3. Full, Fresh, Fast , Self Serve QA Branching Federated Forensics Testing : A/B, upgrade, patch Recovery: logical, physical BI a) b) c) Fast refresh Temporal Data Confidence testing
    91. 91. perhaps the single largest storage consolidation opportunity history“ over 10 times
    92. 92. Oracle 12c
    93. 93. 80MB buffer cache ?
    94. 94. 200GB Cache
    95. 95. with Latency Tnxs / min 5000 300 ms 1 5 10 20 30 60 100 200 Users 1 5 10 20 30 60 100 200
    96. 96. Latency Tnxs / min 8000 600 ms 1 5 10 20 30 60 100 200 Users 1 5 10 20 30 60 100 200
    97. 97. $1,000,000 $6,000
    98. 98. Database Virtualization
    99. 99. About Delphix • • • • Founded in 2008, launched in 2010 CEO Jedidiah Yueh (founder of Avamar: >$1B revenue)) Based in Silicon Valley, Global Operations 10% of Fortune 500
    100. 100. Business Develop IT Storage $27,000M $850M $75M $40M
    101. 101. Good, Cheap, Fast : choose two Good Cheap Fast
    102. 102. FS vs. ZFS • • • FS per Volume FS limited bandwidth Storage stranded FS FS FS Volume Volume Volume • • • Many FS in a pool Grow automatically All bandwidth ZFS ZFS Storage Pool ZFS
    103. 103. Three Core Parts Production Instance 1 Copy Sync Snapshots Purge Time Flow Storage File System 2 Clone (snapshot) Compress Share Cache Storage Agnostic Development Instance 3 Mount, recover, rename Self Service, Roles & Security Rollback & Refresh Branch & Tag

    ×