Nyoug delphix slideshare

1,135 views

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,135
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
36
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Work for a company called DelphixWe write software that enables Oracle and SQL Server customers toCopy their databases in 2 minutes with almost no storage overheadWe accomplish that by taking one initial copy and sharing the duplicate blocks Across all the clonesExpect vt100 interface -> got an apple slick interfaceConcerned about NFS performance -> Banged on it for 2 years.what is Agile Data?How does that change the industry?How do you get data where you need it? Like Hadoop? Sure file system snapshots exists, but only available to sites with Netapp or EMC Can change your careerRock start DBADBA manager -> directorDirector -> VPVP -> CTO
  • if you look at what’s really impeding flow from development to operations to the customer,  it’s typically IT operations.Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment.  When that happens terrible things happen. People actually horde environments.  They invite people to their teams because the know they have  reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.One of the most powerful things that organizations can do is to enable development and testing to get environment they need  when they need it“One of the best predictors of DevOps performance is that IT Operations can make available environments available on-demand to Development and Test, so that they can build and test the application in an environment that is synchronized with Production.One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need itEliyahuGoldratt
  • if you look at what’s really impeding flow from development to operations to the customer,  it’s typically IT operations.Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment.  When that happens terrible things happen. People actually horde environments.  They invite people to their teams because the know they have  reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.One of the most powerful things that organizations can do is to enable development and testing to get environment they need  when they need it“One of the best predictors of DevOps performance is that IT Operations can make available environments available on-demand to Development and Test, so that they can build and test the application in an environment that is synchronized with Production.One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need itEliyahuGoldrattIT bottlenecksSetting PrioritiesCompany GoalsDefining MetricsFast IterationsIT version of “The Goal” by E. Goldratt
  • if you look at what’s really impeding flow from development to operations to the customer,  it’s typically IT operations.Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment.  When that happens terrible things happen. People actually horde environments.  They invite people to their teams because the know they have  reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.One of the most powerful things that organizations can do is to enable development and testing to get environment they need  when they need it“One of the best predictors of DevOps performance is that IT Operations can make available environments available on-demand to Development and Test, so that they can build and test the application in an environment that is synchronized with Production.One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need itEliyahuGoldrattIT bottlenecksSetting PrioritiesCompany GoalsDefining MetricsFast IterationsIT version of “The Goal” by E. Goldratt
  • if you look at what’s really impeding flow from development to operations to the customer,  it’s typically IT operations.Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment.  When that happens terrible things happen. People actually horde environments.  They invite people to their teams because the know they have  reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.One of the most powerful things that organizations can do is to enable development and testing to get environment they need  when they need it“One of the best predictors of DevOps performance is that IT Operations can make available environments available on-demand to Development and Test, so that they can build and test the application in an environment that is synchronized with Production.One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need itEliyahuGoldrattIT bottlenecksSetting PrioritiesCompany GoalsDefining MetricsFast IterationsIT version of “The Goal” by E. Goldratt
  • Get the right dataTo the right peopleAt the right time
  • want data now.don’t understand DBAs.Db bigger and harder to copy.Devswant more copies.Reporting wants more copies.Everyone has storage constraints.If you can’t satisfy the business demands your process is broken
  • Moving the data IS the big gorilla. This gorilla of a data tax is hitting your bottom line hard.
  • Probably nothing more onerous for a DBA than to hear “can you get me a copy of the production database for my project”RMAN vs Delphix. I was running out of space for RMAN live demo !When moving data is too hard, then the data in non production systems such as reporting, development or QA becomes older, and the older the data, the less actionable intelligence your BI or Analytics can give you.
  • ExampleSome customers have over 1 Petabyte duplicate data(1000 TB, ie 1,000,000 GB )
  • We know from our experience that there are some $1B+ Data center consolidation price tags. Taking even 30% of the cost out of that, and cutting the timeline, is a strong and powerful way to improve margin.What about really big problems like consolidating data center real estate, or moving to the cloud?f you can non-disruptively collect the data, and easily and repeatedly present it in the target data center, you take huge chunks out of these migration timelines. Moreover, with data being so easy to move on demand, you neutralize the hordes of users who insist that there isn’t enough time to do this, or its too hard, or too risky. Annual time spent coping databases can measure in the 1000s of hours just for DBAs not including all the other personnel required to supply the infrastructure necessary
  • Data gets old because not refreshedInstead of running 5 tests in two weeks (because it takes me 2 days to rollback after each of my 1 hour tests) and paying the cost of bugs slipping into production, what if I could run 15 tests in that same two weeks and have no bugs at all in production?
  • And they told us that they spend 96% of their QA cycle time building the QA environmentAnd only 4% actually running the QA suiteThis happens for every QA suitemeaningFor every dollar spent on QA there was only 4 cents of actual QA value Meaning 96% cost is spent infrastructure time and overhead
  • Because of the time required to set up QA environmentsThe actual QA tests suites lag behind the end of a sprint or code freezeMeaning that the amount of time that goes by after the introduction of a bug in code and before the bug is found increasesAnd the more time that goes by after the introduction of a bug into the codeThe more dependent is written on top of the bug Increasing the amount of code rework required after the bug is finally foundIn his seminal book that some of you may be familiar with, “Software Engineering Economics”, author Barry Boehm Introduce the computer world to the idea that the longer one delays fixing a bug in the application design lifescyleThe more expensive it is to to fix that bug and these cost rise exponentially the laterThe bug is address in the cycle
  • Not sure if you’ve run into this but I have personally experience the followingWhen I was talking to one group at Ebay, in that development group they Shared a single copy of the production database between the developers on that team.What this sharing of a single copy of production meant, is that whenever a Developer wanted to modified that database, they had to submit their changes to codeReview and that code review took 1 to 2 weeks.I don’t know about you, but that kind of delay would stifle my motivationAnd I have direct experience with the kind of disgruntlement it can cause.When I was last a DBA, all schema changes went through me.It took me about half a day to process schema changes. That delay was too much so it was unilaterally decided byThey developers to go to an EAV schema. Or entity attribute value schemaWhich mean that developers could add new fields without consulting me and without stepping on each others feat.It also mean that SQL code as unreadable and performance was atrocious.Besides creating developer frustration, sharing a database also makes refreshing the data difficult as it takes a while to refresh the full copyAnd it takes even longer to coordinate a time when everyone stops using the copy to make the refreshAll this means is that the copy rarely gets refreshed and the data gets old and unreliable
  • To circumvent the problems of sharing a single copy of productionMany shops we talk to create subsets.One company we talked to , spends 50% of time copying databases have to subset because not enough storagesubsetting process constantly needs fixing modificationNow What happens when developers use subsets -- ****** -----
  • Subsets instead of full database copies.
  • If Walmart in New York sold Lego Batman like hotcakes the morning it came out, wouldn’t be good to know at Walmart CaliforniaWeek old data happens when refreshes are too disruptive and limited to weekends
  • You might be familiar with this cycle that we’ve seen in the industry:Where IT departments budgets are being constrainedWhen IT budgets are constrained one of the first targets is reducing storageAs storage budgets are reduced the ability to provision database copies and development environments goes downAs development environments become constrained, projects start to hit delays. As projects are delayed The applications that the business depend on to generate revenue to pay for IT budgets are delayedWhich reduces revenue as the business cannot access new applications Which in turn puts more pressure on the IT budget.It becomes a viscous circle
  • Internet vs browserAutomate or die – the revolution will be automatedThe worst enemy of companies today is thinking that they have the best processes that exist, that their IT organizations are using the latest and greatest technology and nothing better exists in the field. This mentality will be the undermining of many companies.http://www.kylehailey.com/automate-or-die-the-revolution-will-be-automated/Data IS the constraintBusiness skeptics are saying to themselves that data processes are just a rounding error in most of their project timelines, and that they are sure their IT has developed processes to fix that. That’s the fundamental mistake. The very large and often hidden data tax lay in all the ways that we’ve optimized our software, data protection, and decision systems around the expectation that data is simply not agile. The belief that there is no agility problem is part of the problem.http://www.kylehailey.com/data-is-the-constraint/
  • Due to the constraints of building clone copy database environments one ends up in the “culture of no”Where developers stop asking for a copy of a production database because the answer is “no”If the developers need to debug an anomaly seen on production or if they need to write a custom module which requires a copy of production they know not to even ask and just give up.
  • “The status quo is pre-ordained failure” Internet vs browserAutomate or die – the revolution will be automatedThe worst enemy of companies today is thinking that they have the best processes that exist, that their IT organizations are using the latest and greatest technology and nothing better exists in the field. This mentality will be the undermining of many companies.http://www.kylehailey.com/automate-or-die-the-revolution-will-be-automated/Data IS the constraintBusiness skeptics are saying to themselves that data processes are just a rounding error in most of their project timelines, and that they are sure their IT has developed processes to fix that. That’s the fundamental mistake. The very large and often hidden data tax lay in all the ways that we’ve optimized our software, data protection, and decision systems around the expectation that data is simply not agile. The belief that there is no agility problem is part of the problem.http://www.kylehailey.com/data-is-the-constraint/
  • Internet vs browserengine vs carAutomate or die – the revolution will be automatedThe worst enemy of companies today is thinking that they have the best processes that exist, that their IT organizations are using the latest and greatest technology and nothing better exists in the field. This mentality will be the undermining of many companies.http://www.kylehailey.com/automate-or-die-the-revolution-will-be-automated/Data IS the constraintBusiness skeptics are saying to themselves that data processes are just a rounding error in most of their project timelines, and that they are sure their IT has developed processes to fix that. That’s the fundamental mistake. The very large and often hidden data tax lay in all the ways that we’ve optimized our software, data protection, and decision systems around the expectation that data is simply not agile. The belief that there is no agility problem is part of the problem.http://www.kylehailey.com/data-is-the-constraint/
  • How long does it take a developer to get a copy of a database
  • Fastest query is the query not run
  • Source Syncing* Initial backup once onlyContinual forever change collection Purging of old data Storage DxFSShare blocks snap shots , unlimited, storage agnosticCompression , 1/3 typically, compress on block boundaries. Overhead for compression is basically undetectable Share data in memory, super caching*Self Service AutomationVirtual database provisioning, rollback, refresh*, branching*, tagging*Mount files over NFSInit.ora, SID, database name, database unique nameSecurity on who can see which source databases, how many clones they can make and how much storage they can use
  • Source Syncing* Initial backup once onlyContinual forever change collection Purging of old data Storage DxFSShare blocks snap shots , unlimited, storage agnosticCompression , 1/3 typically, compress on block boundaries. Overhead for compression is basically undetectable Share data in memory, super caching*Self Service AutomationVirtual database provisioning, rollback, refresh*, branching*, tagging*Mount files over NFSInit.ora, SID, database name, database unique nameSecurity on who can see which source databases, how many clones they can make and how much storage they can use
  • Like the internet
  • In the physical database world, 3 clones take up 3x the storage.In the virtual world 3 clones take up 1/3 the storage thanks to block sharing and compression
  • Software installs an any x86 hardware uses any storage supports Oracle 9.2-12c, standard edition, enterprise edition, single instance and RAC on AIX, Sparc, HPUX, LINUX support SQL Server
  • EMC, Netapp, Fujitsu, Or newer flash storage likeViolin, Pure Storage, Fusion IO etc
  • Delphix does a one time only copy of the source database onto Delphix
  • Giving each developer their own copy
  • Requirements: fast data refresh, rollbackData delivery takes 480 out of 500 minute test cycle (4% value)$.04/$1.00 actual testing vs. setup
  • Multiple scripted dumps or RMAN backups are used to move data today. With application awareness, we only request change blocks—dramatically reducing production loads by as much as 80%. We also eliminate the need for DBAs to manage custom scripts, which are expensive to maintain and support over time.
  • Physically independent but logically correlatedCloning multiple source databases at the same time can be a daunting task
  • One example with our customers is InformaticaWho had a project to integrate 6 databases into one central databaseThe time of the project was estimated at 12 monthsWith much of that coming from trying to orchestratingGetting copies of the 6 databases at the same point in timeLike herding cats
  • Walmart.comInformatical had a 12 month project to integrate 6 databases.After installing Delphix they did it in 6 months.I delivered this earlyI generated more revenueI freed up money and put it into innovationwon an award with Ventana Research for this project
  • From our experience before and after with Fortune 500 companies
  • How big is the data tax? One way we can measure it is by looking at the improvements in project timelines at companies that have eliminated this data tax through implementing a data virtualization appliance (DVA) and creating an agile data platform (ADP). Agile data is data that is delivered to the exact spot it’s needed just in time and with much less time/cost/effort. By looking at productivity rates after implementing an ADP compared to before the ADP we can get an idea of the price of the data tax without an ADP. IT experts building mission critical systems for Fortune 500 companies have seen real project returns averaging 20-50% productivity increases after having implemented an ADP. That’s a big data tax to pay without an ADP. The data tax is real, and once you understand how real it is, you realize how many of your key business decisions and strategies are affected by the agility of the data in your applications.Took us 50 days to develop an insurance product … now we can get a product to the customer in 23 days with Delphix
  • http://www.computerworld.com/s/article/9242959/The_Grill_Gino_Pokluda_gains_control_of_an_unwieldy_database_system?taxonomyId=19
  • Moral of this storyInstead of dragging behind enormous amounts of infrastructureand bureaucracy  required to provide database copiesUses db virteliminates the drag and provides power and acceleration To your companyDefining moment CompetitorsServices
  • Moving the data IS the big gorilla. Eliminating the data tax is crucial to the success of your company. And, if huge databases can be ready at target data centers in minutes, the rest of the excuses are flimsy. Agile data – virtualized data – uses a small footprint. A truly agile data platform can deliver full size datasets cheaper than subsets. A truly agile data platform can move the time or the location pointer on its data very rapidly, and can store any version that’s needed in a library at an unbelievably low cost. And, a truly agile data platform can massively improve app quality by making it reliable and dead simple to return to a common baseline for one or many databases in a very short amount of time. Applications delivered with agile data can afford a lot more full size virtual copies, eliminating wait time and extra work caused by sharing, as well as side effects. With the cost of data falling so dramatically, business can radically increase their utilization of existing hardware and storage, delivering much more rapidly without any additional cost. An agile data platform presents data so rapidly and reliably that the data becomes commoditized – and servers that sit idle because it would just take too long to rebuild can now switch roles on demand.
  • Once Last Thinghttp://www.dadbm.com/wp-content/uploads/2013/01/12c_pluggable_database_vs_separate_database.png
  • 250 pdb x 200 GB = 50 TBEMC sells 1GB$1000Dell sells 32GB $1,000.terabyte of RAM on a Dell costs around $32,000terabyte of RAM on a VMAX 40k costs around $1,000,000.
  • http://www.emc.com/collateral/emcwsca/master-price-list.pdf    These prices obtain on pages 897/898:Storage engine for VMAX 40k with 256 GB RAM is around $393,000Storage engine for VMAX 40k with  48 GB RAM is around $200,000So, the cost of RAM here is 193,000 / 208 = $927 a gigabyte.   That seems like a good deal for EMC, as Dell sells 32 GB RAM DIMMs for just over $1,000.    So, a terabyte of RAM on a Dell costs around $32,000, and a terabyte of RAM on a VMAX 40k costs around $1,000,000.2) Most DBs have a buffer cache that is less than 0.5% (not 5%, 0.5%) of the datafile size.
  • Nyoug delphix slideshare

    1. 1. Agile Data : Virtual Data Revolution Kyle@delphix.com kylehailey.com slideshare.com/khailey
    2. 2. In this presentation : • Problem in IT • Solution • Use Cases
    3. 3. In this presentation : • Problem in IT • Solution • Use Cases
    4. 4. The Phoenix Project • Bottlenecks • Metrics • Priorities • Goals • Iterations “The Goal” by E. Goldratt
    5. 5. The Phoenix Project “Any improvement not made at the constraint is an illusion.”
    6. 6. The Phoenix Project “Any improvement not made at the constraint is an illusion.” What is the constraint?
    7. 7. The Phoenix Project “Any improvement not made at the constraint is an illusion.” What is the constraint? “One of the most powerful things that IT can do is get environments to development and QA when they need it”
    8. 8. Problem in IT I. Data Constraint strains IT II. Data Constraint price is huge III. Data Constraint companies unaware
    9. 9. Problem in IT 60% Projects Over Schedule 85% delayed waiting for data Data is the Constraint CIO Magazine Survey: Current situation: only getting worse … Data Doomsday
    10. 10. I. Data Constraint strains IT If you can’t satisfy the business demands then your process is broken.
    11. 11. II. Data Constraint price is huge
    12. 12. III. Data Constraint : companies unaware
    13. 13. Data is the constraint I. Data Constraint strains IT II. Data Constraint price is huge III. Data Constraint companies unaware
    14. 14. I. Data Constraint companies unaware – Moving data is hard – Triple tax – Data Floods infrastructure
    15. 15. I. Data Constraint : moving data is hard – Storage & Systems – Personnel – Time
    16. 16. Typical Architecture Production Instance File system Database
    17. 17. Typical Architecture Production Instance Backup File system Database File system Database
    18. 18. Typical Architecture Production Instance Reporting Backup File system Database Instance File system Database File system Database
    19. 19. Typical Architecture Production Instance File system Database Instance File system Database File system Database File system Database Instance Instance Instance File system Database File system Database Dev, QA, UAT Reporting Backup Triple Tax
    20. 20. Typical Architecture Production Instance File system Database Instance File system Database File system Database File system Database Instance Instance Instance File system Database File system Database
    21. 21. I. Data constraint: Data floods company infrastructure 92% of the cost of business , in financial services business , is “data” www.wsta.org/resources/industry-articles Most companies have 2-9% IT spending http://uclue.com/?xq=1133 Data management is the largest Part of IT expense Gartner: Data Doomsday
    22. 22. Data is the constraint I. Data Constraint strains IT II. Data Constraint price is huge III. Data Constraint companies unaware
    23. 23. Part II. Data constraint price is Huge
    24. 24. Part II. Data constraint price is Huge • Four Areas data tax hits 1. IT Capital resources 2. IT Operations personnel 3. Application Development 4. Business
    25. 25. Part II. Data constraint price is Huge • Four Areas data tax hits 1. IT Capital resources 2. IT Operations personnel 3. Application Development 4. Business
    26. 26. II. Data constraint price is huge : 1. IT Capital • Hardware –Servers –Storage –Network –Data center floor space, power, cooling
    27. 27. Part II. Data constraint price is Huge • Four Areas data tax hits 1. IT Capital resources 2. IT Operations personnel 3. Application Development 4. Business
    28. 28. II. Data constraint price is huge : 2. IT Operations • People – DBAs – SYS Admin – Storage Admin – Backup Admin – Network Admin • Hours : 1000s just for DBAs • $100s Millions for data center modernizations
    29. 29. Part II. Data constraint price is Huge • Four Areas data tax hits 1. IT Capital resources 2. IT Operations personnel 3. Application Development 4. Business
    30. 30. II. Data constraint price is Huge : 3. App Dev • Inefficient QA: Higher costs of QA • QA Delays : Greater re-work of code • Sharing DB Environments : Bottlenecks • Using DB Subsets: More bugs in Prod • Slow Environment Builds: Delays “if you can't measure it you can’t manage it”
    31. 31. II. Data Tax is Huge : 3. App Dev Long Build Time QA Test 96% of QA time was building environment $.04/$1.00 actual testing vs. setup Build
    32. 32. II. Data Tax is Huge : 3. App Dev Build QA Env QA Build QA Env QA Sprint 1 Sprint 2 Sprint 3 Bug CodeX 0 10 20 30 40 50 60 70 1 2 3 4 5 6 7 Delay in Fixing the bug Cost To Correct Software Engineering Economics – Barry Boehm (1981)
    33. 33. II. Data Tax is Huge : 3. App Dev full copies cause bottlenecks Frustration Waiting Old Unrepresentative Data
    34. 34. II. Data Tax is Huge : 3. App Dev subsets cause bugs
    35. 35. II. Data Tax is Huge : 3. App Dev subsets cause bugs The Production ‘Wall’
    36. 36. II. Data Tax is Huge : 3. App Dev Developer Asks for DB Get Access Manager approves DBA Request system Setup DB System Admin Request storage Setup machine Storage Admin Allocate storage (take snapshot) 3-6 Months to Deliver Data
    37. 37. II. Data Tax is Huge : 3. App Dev Why are hand offs so expensive? 1hour 1 day 9 days
    38. 38. II. Data Tax is Huge : 3. App Dev Slow Environment Builds Never enough environments
    39. 39. Part II. Data constraint price is Huge • Four Areas data tax hits 1. IT Capital resources 2. IT Operations personnel 3. Application Development 4. Business
    40. 40. II. Data constraint price is Huge : 4. Business Ability to capture revenue • Business Intelligence – Old data = less intelligence • Business Applications – Delays cause => Lost Revenue
    41. 41. II. Data constraint price is Huge : 4. Business
    42. 42. II. Data constraint price is Huge : 4. Business 0 5 10 15 20 25 30 Storage IT Ops Dev Revenue Billion $
    43. 43. Data is the constraint I. Data Constraint strains IT II. Data Constraint price is huge III. Data Constraint companies unaware
    44. 44. Part III. Data Constraint companies unaware
    45. 45. III. Data Constraint companies unaware DBA Developer
    46. 46. III. Data Constraint companies unaware #1 Biggest Enemy : IT departments believe – best processes – greatest technology – Just the way it is
    47. 47. III. Data Constraint companies unaware Why do I need an iPhone ? Don’t we already do that ?
    48. 48. III. Data Constraint companies unaware • Ask Questions – me: we provision environments in minutes for almost not extra storage. – Customer: We already do that – me: How long does it take a developer to get an environment after they ask ? – Customer: 2-3 weeks – me: we do it in 2-3 minutes
    49. 49. III. Data Constraint companies unaware How to enlighten? Ask for metrics – How old is data in • BI and DW : ETL windows • QA and Dev : how often refreshed – How long does it take a developer to get a DB copy? – How long does it take QA to setup an environment
    50. 50. Data is the constraint I. Data Constraint strains IT II. Data Constraint price is huge III. Data Constraint companies unaware
    51. 51. In this presentation : • Problem in the Industry • Solution • Use Cases
    52. 52. Clone 1 Clone 3Clone 2 99% of blocks are identical
    53. 53. Solution
    54. 54. Clone 1 Clone 2 Clone 3 Thin Clone
    55. 55. Technology Core : file system snapshots • Vmware Linked Clones – Not supported for Oracle • EMC – 16 snapshots – Write performance impact • Netapp – 255 snapshots • ZFS – Unlimited snapshots
    56. 56. III. Companies unaware of the Data Tax
    57. 57. Three Core Parts Production File System Instance DevelopmentStorage 21 3 Copy Sync Snapshots Time Flow Purge Clone (snapshot) Compress Share Cache Storage Mount, recover, rename Self Service, Roles & Security Rollback & Refresh Branch & Tag Instance
    58. 58. Three Core Parts Production File System Instance DevelopmentStorage 21 3 Copy Sync Snapshots Time Flow Purge Clone (snapshot) Compress Share Cache Storage Mount, recover, rename Self Service, Roles & Security Rollback & Refresh Branch & Tag Instance
    59. 59. 3. Database Virtualization
    60. 60. Three Physical Copies Three Virtual Copies Data Virtualization Appliance
    61. 61. Install Delphix on x86 hardware Intel hardware
    62. 62. Allocate Any Storage to Delphix Allocate Storage Any type Pure Storage + Delphix Better Performance for 1/10 the cost
    63. 63. One time backup of source database Database Production File systemFile system Upcoming Supports InstanceInstanceInstance Application Stack Data
    64. 64. DxFS (Delphix) Compress Data Database Production Data is compressed typically 1/3 size File system InstanceInstanceInstance
    65. 65. Incremental forever change collection Database Production File system Changes • Collected incrementally forever • Old data purged File system Time Window Production InstanceInstanceInstance
    66. 66. Source Full Copy Source backup from SCN 1
    67. 67. Snapshot 1 Snapshot 2
    68. 68. Snapshot 1 Snapshot 2 Backup from SCN
    69. 69. Snapshot 1 Snapshot 2 Snapshot 3
    70. 70. Drop Snapshot Snapshot 1 Snapshot 2 Snapshot 3 Snapshot 2 Snapshot 3 Drop Snapshot 1
    71. 71. Virtual DB 71 / 30 Jonathan Lewis © 2013 Snapshot 1 – full backup once only at link time a b c d e f g h i We start with a full backup - analogous to a level 0 rman backup. Includes the archived redo log files needed for recovery. Run in archivelog mode.
    72. 72. Virtual DB 72 / 30 Jonathan Lewis © 2013 Snapshot 2 (from SCN) b' c' a b c d e f g h i The "backup from SCN" is analogous to a level 1 incremental backup (which includes the relevant archived redo logs). Sensible to enable BCT. Delphix executes standard rman scripts
    73. 73. Virtual DB 73 / 30 Jonathan Lewis © 2013 a b c d e f g h i Apply Snapshot 2 b' c' The Delphix appliance unpacks the rman backup and "overwrites" the initial backup with the changed blocks - but DxFS makes new copies of the blocks b' c'
    74. 74. Virtual DB 74 / 30 Jonathan Lewis © 2013 Derived Full Backup at Snapshot 2 b' c'a d e f g h i The call to rman leaves us with a new level 0 backup, waiting for recovery. But we can pick the snapshot root block. We have EVERY level 0 backup
    75. 75. Virtual DB 75 / 30 Jonathan Lewis © 2013 Creating a vDB b' c'a d e f g h i The first step in creating a vDB is to take a snapshot of the filesystem as at the backup you want (then roll it forward) My vDB (filesystem) Your vDB (filesystem)
    76. 76. Virtual DB 76 / 30 Jonathan Lewis © 2013 Creating a vDB b' c'a d e f g h i The first step in creating a vDB is to take a snapshot of the filesystem as at the backup you want (then roll it forward) My vDB (filesystem) Your vDB (filesystem) i’
    77. 77. Cloning Database Production Instance File systemFile system Time Window Database InstanceInstance InstanceInstance
    78. 78. In this presentation : • Problem in the Industry • Solution • Use Cases
    79. 79. Use Cases 1. Development 2. QA 3. Recovery 4. Business Intelligence 5. Modernization
    80. 80. Use Cases 1. Development 2. QA 3. Recovery 4. Business Intelligence 5. Modernization
    81. 81. Development • Parallelized Environments • Full size environments • Self Service Development
    82. 82. Development: Parallelize Environments gif by Steve Karam
    83. 83. Development: Full size copies
    84. 84. Development: Self Service
    85. 85. Use Cases 1. Development 2. QA 3. Recovery 4. Business Intelligence 5. Modernization
    86. 86. QA • Fast • Parallel • Rollback • A/B testing
    87. 87. QA : Fast environments with Branching Instance Instance Instance Source Dev QA branched from Dev Source dev QA
    88. 88. QA : Fast environments with Branching B u i l d T i m e QA Test 1% of QA time was building environment $.99/$1.00 actual testing vs. setup Build Time QA Test Build
    89. 89. QA : bugs found fast Sprint 1 Sprint 2 Sprint 3 Bug CodeX QA QA Build QA Env Q A Build QA Env Q A Sprint 1 Sprint 2 Sprint 3 Bug Cod e X
    90. 90. QA : Parallel environments Instance Instance Instance Instance Source
    91. 91. QA : Rewind for patch and QA testing Instance Instance Development Time Window Prod
    92. 92. QA : A/B testing Instance Instance Instance Index 1 Index 2
    93. 93. Use Cases 1. Development 2. QA 3. Quality 4. Business Intelligence 5. Modernization
    94. 94. Quality 1. Prod & Dev Backups 2. Surgical recovery 3. Recovery of Production 4. Recovery of Development 5. Bug Forensics
    95. 95. Quality : 50 days of backup in size of production
    96. 96. Quality : Surgical recovery Instance Instance Development Time Window Before dropDrop Source
    97. 97. Quality: recovery of development Instance Instance Dev1 VDB Time Window Time Window Dev1 VDB Instance Source Source Dev2 VDB Branched Time Window Dev2 VDB Branched
    98. 98. Quality : recovery of production Instance Instance VDBSource Time Window Corruption
    99. 99. 1. Forensics: Investigate Production Bugs Instance Time Window Instance Development Bug Yesterday Yesterday
    100. 100. Use Cases 1. Development 2. QA 3. Quality 4. Business Intelligence 5. Modernization
    101. 101. Business Intelligence • 24x7 Batches • Low Bandwidth • Temporal Data • Confidence Testing
    102. 102. Business Intelligence: ETL and Refresh Windows 1pm 10pm 8am noon
    103. 103. Business Intelligence: ETL and DW refreshes taking longer 1pm 10pm 8am noon 2011 2012 2013 2014 2015
    104. 104. Business Intelligence ETL and Refresh Windows 2011 2012 2013 2014 2015 1pm 10pm 8am noon 10pm 8am noon 9pm 6am 8am 10pm
    105. 105. Business Intelligence: ETL and DW Refreshes Instance Prod Instance DW & BI Data Guard – requires full refresh if used Active Data Guard – read only, most reports don’t work
    106. 106. Business Intelligence: Fast Refreshes • Collect only Changes • Refresh in minutes Instance Instance Prod Instance BI and DW ETL 24x7
    107. 107. Business Intelligence: Temporal Data
    108. 108. Business Intelligence a) 24x7 Batches & Refreshes a) Temporal queries b) Confidence testing
    109. 109. Use Cases 1. Development 2. QA 3. Quality 4. Business Intelligence 5. Modernization
    110. 110. Modernization 1. Federated 2. Consolidation 3. Migration 4. Auditing
    111. 111. Modernization: Federated Instance Instance Instance Instance Source1 Source2 Source1
    112. 112. Modernization: Federated
    113. 113. “I looked like a hero” Tony Young, CIO Informatica Modernization: Federated
    114. 114. Modernization: Data Center Migration 5x Source Data Copy < 1 x Source Data Copy S SC C C C V V V V
    115. 115. Modernization: Consolidation Without Delphix With Delphix
    116. 116. Dev QA UAT Dev QA UAT 2.6 2.7 Dev QA UAT 2.8 Data Control = Source Control for the Database Production Time Flow Modernization: Auditing & Version Control CIO Insurance 600 Applications CIO Investment Banking 180 Applications CIO South America 65 Applications
    117. 117. Use Case Summary 1. Development 2. QA 3. Quality 4. Business Intelligence 5. Performance Acceleration
    118. 118. How expensive is the Data Constraint? Measure before and after Delphix w/ Fortune 500 : Median App Dev throughput increase by 2x
    119. 119. How expensive is the Data Constraint? • 10 x Faster Financial Close • 9x Faster BI refreshes • 2x faster Projects • 20 % less bugs
    120. 120. Agile Data Quotes • “Allowed us to shrink our project schedule from 12 months to 6 months.” – BA Scott, NYL VP App Dev • "It used to take 50-some-odd days to develop an insurance product, … Now we can get a product to the customer in about 23 days.” – Presbyterian Health • “Can't imagine working without it” – Ramesh Shrinivasan CA Department of General Services
    121. 121. Summary • Problem: Data is the constraint • Solution: Agile data is small & fast • Results: Deliver projects – Half the Time – Higher Quality – Increase Revenue Kyle@delphix.com kylehailey.com slideshare.net/khailey
    122. 122. Future Now • Application Stack Cloning • Cross Platform Cloning : UNIX -> Linux • Postgres Coming • VM cloning • Workflows – Chef, Puppet, etc workflows for virtual data provisioning • Developer workspaces – Check out, check in, bookmark, tagging, rollback, refresh • Secure Data – Masking • More Databases – MySQL, Sybase, DB2, Hadoop, Mongo, Cassandra • DR and HA
    123. 123. Oracle 12c
    124. 124. 80MB buffer cache ?
    125. 125. 200GB Cache
    126. 126. 5000 Tnxs/minLatency 300 ms 1 5 10 20 30 60 100 200 with 1 5 10 20 30 60 100 200 Users
    127. 127. 8000 Tnxs/minLatency 600 ms 1 5 10 20 30 60 100 200 Users 1 5 10 20 30 60 100 200
    128. 128. $1,000,000 1TB cache on SAN $6,000 200GB shared cache on Delphix Five 200GB database copies are cached with :

    ×