Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017

448 views

Published on

Andrew Miller & Rebecca Fitzhugh Session from VMworld US

Published in: Technology
  • Be the first to comment

  • Be the first to like this

MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017

  1. 1. Andrew Miller Rebecca Fitzhugh MGT3342BUS #VMworld #MGT3342BUS Architecting Data Protection with Rubrik
  2. 2. • This presentation may contain product features that are currently under development. • This overview of new technology represents no commitment from VMware to deliver these features in any generally available product. • Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. • Technical feasibility and market demand will affect final delivery. • Pricing and packaging for any new technologies or features discussed or presented have not been determined. Disclaimer 2
  3. 3. Rebecca Fitzhugh Tweet Blogger Co-Host I have a job! Author VMware @ rebeccafitzhugh @ technicloud.com @ vbrownbag.com @ Rubrik.com vSphere Virtual Machine Management Learning VMware vSphere VCDX #243
  4. 4. Andrew Miller Tweet Blogger TMM Background Certs VMware @ andriven @ thinkmeta.net @ Rubrik.com 7 years customer, 8 years partner. Lots of Random Ones vExpert (6x)
  5. 5. Agenda? Nah… Share Data Protection Architecture Knowledge (more than half) Show Where Rubrik Fits Technically + Demo (less than half) Fair? (Q&A Too)
  6. 6. Why bother? One big reason… Business Expectations Of Disaster Recovery / Data Protection IT Capabilities For Disaster Recovery / Data Protection !=!=
  7. 7. What Are You Really Protecting Yourself Against? • Lost or postponed sales and income • Regulatory fines • Delay of new business plans • Loss of contractual bonuses • Customer dissatisfaction • Timing and duration of disruption • Increased expenses such as overtime labor and outsourcing • Employee Burnout
  8. 8. What is a Disaster? Disaster: An event that affects a service or system such that significant effort is required to restore the original performance level. • But what does that look like IN OUR ENVIRONMENT? • What disaster and recovery scenarios should we plan for?
  9. 9. Sabotage!
  10. 10. Natural Disaster 12
  11. 11. Natural Disaster 13
  12. 12. Natural Disaster 14
  13. 13. Natural Disaster 15
  14. 14. Power Loss 16
  15. 15. Power Loss 17
  16. 16. Power Loss 18
  17. 17. What is the most common scenario for disaster? 19
  18. 18. What is a Disaster? Disaster: An event that affects a service or system such that significant effort is required to restore the original performance level. • But what does that look like IN OUR ENVIRONMENT? • What disaster and recovery scenarios should we plan for? • Where do we begin? • How do we do it?
  19. 19. What is a Business Impact Analysis (BIA)? • A process to understand: – What is the monetary impact of a disaster or failure? – What are the most time-critical and information-critical business processes? – How does the business REALLY rely upon IT Service and Application availability? – What availability or recoverability capabilities are justifiable based on these requirements, potential impact, and costs? • Composed of two components – Technical Discovery – Data Gathering – Human Conversation – Talk to People!
  20. 20. Example Output – Priority Tiers Priority Tier Description Priority 1 High Availability / Immediate Recovery Services whose unavailability more than a brief period can have a severe impact on customers or time-critical business operations. Priority 2 1-2 day recovery Services whose unavailability significantly impacts customers or business operations. Priority 3 3-5 day recovery Services which can tolerate up to five days of disruption in a disaster. Priority 4 6-10 day recovery Services which can tolerate up to ten days of disruption in a disaster. Priority 3 and 4 systems may be restored in less time, depending on the situation. However, higher priority functions will be restored first. Priority 5 “Best effort” recovery Non-critical services which can tolerate two weeks or more of disruption in a disaster. These systems will be restored on a best-effort basis, after other more critical systems have been restored and ongoing operations have resumed. Priority 5 systems may be restored in less time, depending on the situation. However, higher priority functions will be restored first. In some cases, systems deemed to not be required for continued operations may not be restored.
  21. 21. What is an SLA? • A contract between an external service provider and its customers or between an IT department and the internal business units it serves. 23
  22. 22. What is an SLA? • Two 9’s – 99% = 3.65 days of downtime per year (easy to achieve, less expensive) • Three 9’s – 99.9% = 8.76 hours of downtime per year • Four 9’s – 99.99% = 52.6 minutes of downtime per year • Five 9’s – 99.999% = 5.26 minutes of downtime per year (difficult to achieve, expensive!) 24
  23. 23. DECLARE DISASTER 10 a.m. Recovery Point Objectives (RPO) Recovery Time Objectives (RTO) RPO: Amount of data lost from failure, measured as the amount of time from a disaster event RTO: Targeted amount of time to restart a business service after a disaster event 5 a.m. 6 a.m. 7 a.m. 8 a.m. 9 a.m. 10 a.m. 11 a.m. 12 a.m. 1 p.m. 2 p.m. 3 p.m. 4 p.m. 5 p.m. 6 p.m. 7 p.m. Disaster Recovery: Key Measures
  24. 24. Cost Disaster Recovery: Key Measures Weeks Days Hours Minutes Seconds WeeksDaysHoursMinutesSeconds Recovery Point Recovery Time Real Time
  25. 25. BC vs DR vs OR – Say What? • Business Continuity – All goes on as normal despite an incident – Could lose a site and have no impact on business operations (active/active sites) • Disaster Recovery – To cope with & recover from an IT crisis that moves work to an alternative system in a non-routine way. – A real “disaster” is large in scope and impact – DR typically implies failure of the primary data center and recovery to an alternate site • Operational Recovery – Addresses more “routine” types of failures (server, network, storage, etc.) – Events are smaller in scope and impact than a full disaster – Typically implies recovering to alternate equipment within the primary data center • Each should have its own clearly defined objectives – at minimum know the difference.
  26. 26. Where Rubrik Helps Let’s keep it architecture focused. 28
  27. 27. 29 Complexity is the Enemy Whatever you do. Whatever you buy. Simplify your Architecture & Expect More
  28. 28. Key Evaluation Criteria What we’ve seen that makes a difference… 1. Reliability of Data Recovery a. Simplicity of Setup and Day 2 Operations – SLA Policies! 30
  29. 29. 31 Data Management: 1990s to Present 1990s – Present Backup & Replication Software Backup Storage Backup Software Backup Servers Backup Proxies Replication Catalog Database Tape Off-site ArchiveBackup Storage a Dedupe Metadata 2000s – Present Data Management: 2000s to Present
  30. 30. In Two Words Sad Panda
  31. 31. 33 Meet Rubrik Cloud Data Management Backup Software Backup Servers Backup Proxies Replication Catalog Database Tape Off-site ArchiveBackup Storage a Dedupe Metadata Private Public Software fabric for orchestrating apps and data across clouds. No forklift upgrades.
  32. 32. 35 How It Works Quick Start: Rack and go. Auto-discovery. Rapid Ingest: Flash-optimized, parallel ingest accelerates snapshots and eliminates stun. Content-aware dedupe. One global namespace. Automate: Intelligent SLA policy engine for effortless management. Instant Recovery: Live Mount VMs & SQL. Instant search and file restore. Secure: End-to-end encryption. Immutability to fight Ransomware. Cloud: “CloudOut” instantly accessible with global search. Launch apps with “CloudOn” for DR or test/dev. Run apps in cloud. Primary Environment SLA Policy Engine Log Management Private Public NAS AHV Hyper-V VMware VMwareVMware VMwareVMware VMware
  33. 33. 36 Your Data Center Today Backup Proxy SAN Production Servers Backup Server Search Server Disk-Based Backup Tape Archive Offsite Tape Vault
  34. 34. 37 Rubrik Simplifies Your Data Center SAN Production Servers Scale Out Scale Out Rubrik Replication + Long-Term Retention + Search Private
  35. 35. Data Management in the Cloud 38 On-Premises Applications & Data Storage Azure Instance Blob Storage Backup Replication Archival Analytics Rubrik Cloud-Native Applications & Data EC2 Instance Rubrik
  36. 36. 39 Recovery Point Objective (RPO) Availability Duration (Retention) When to Archive (RTO) Replication Schedule (DR) {SLA
  37. 37. CONFIDENTIAL 40 Let’s Demo! What does it look like?
  38. 38. Key Evaluation Criteria What we’ve seen that makes a difference… 1. Reliability of Data Recovery a. Simplicity of Setup and Day 2 Operations – SLA Policies! b. Immutability – is your data there there when you need it? 41
  39. 39. Under the Hood 42 “The Interface” “The Logic” “The Core” Distributed Task Framework Callisto Distributed Metadata Service Cluster Management Global Search Cerebro Data Management Crystal UI / API Infinity Ecosystem Integration Thor Cloud Connect Atlas Cloud-Scale File System NFS
  40. 40. Key Evaluation Criteria What we’ve seen that makes a difference… 1. Reliability of Data Recovery a. Simplicity of Setup and Day 2 Operations – SLA Policies! b. Immutability – is your data there there when you need it? 2. Speed of Data Recovery a. Search + Live Mount 43
  41. 41. CONFIDENTIAL 44 Let’s Demo! What does it look like?
  42. 42. Rubrik Backup / Recovery + DR 45 SAN Production Servers Replication + Long-Term Retention + Search DR Servers Rubrik Backup S/W + Dedupe Storage Rubrik Replication & DR Private
  43. 43. Key Evaluation Criteria What we’ve seen that makes a difference… 1. Reliability of Data Recovery a. Simplicity of Setup and Day 2 Operations – SLA Policies! b. Immutability – is your data there there when you need it? 2. Speed of Data Recovery a. Search + Live Mount b. API Usage / Automation to enhance restore capabilities 46
  44. 44. Oh… By the Way 47 Your App Use an API-first platform to create powerful automation workflows that can be integrated with any service that supports outbound REST Now OpenAPI
  45. 45. One More Demo! Wait a minute…we’ve been doing them already. 48
  46. 46. What did you see? 49 Easy Integration with vSphere Working with an SLA Policy Real-time Data Search
  47. 47. 51 Don’t Backup. Go Forward.
  48. 48. Andrew Miller | andrew@rubrik.com | @andriven Rebecca Fitzhugh | rebecca@rubrik.com | @rebeccafitzhugh

×