MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017

Andrew Miller
Rebecca Fitzhugh
MGT3342BUS
#VMworld #MGT3342BUS
Architecting Data
Protection with Rubrik

• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these
features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or
sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not
been determined.
Disclaimer
2

Rebecca Fitzhugh
Tweet
Blogger
Co-Host
I have a job!
Author
VMware
@ rebeccafitzhugh
@ technicloud.com
@ vbrownbag.com
@ Rubrik.com
vSphere Virtual Machine Management
Learning VMware vSphere
VCDX #243

Andrew Miller
Tweet
Blogger
TMM
Background
Certs
VMware
@ andriven
@ thinkmeta.net
@ Rubrik.com
7 years customer, 8 years partner.
Lots of Random Ones
vExpert (6x)

Agenda? Nah…
Share Data Protection
Architecture Knowledge
(more than half)
Show Where Rubrik Fits
Technically + Demo
(less than half)
Fair?
(Q&A Too)

Why bother? One big reason…
Business Expectations
Of
Disaster Recovery /
Data Protection
IT Capabilities
For
Disaster Recovery /
Data Protection
!=!=

What Are You Really Protecting Yourself Against?
• Lost or postponed sales and income
• Regulatory fines
• Delay of new business plans
• Loss of contractual bonuses
• Customer dissatisfaction
• Timing and duration of disruption
• Increased expenses such as overtime labor and outsourcing
• Employee Burnout

What is a Disaster?
Disaster: An event that affects a service or system such that significant effort is required to restore
the original performance level.
• But what does that look like IN OUR
ENVIRONMENT?
• What disaster and recovery scenarios
should we plan for?

What is the most common scenario for disaster?
19

What is a Disaster?
Disaster: An event that affects a service or system such that significant effort is required to restore
the original performance level.
• But what does that look like IN OUR
ENVIRONMENT?
• What disaster and recovery scenarios
should we plan for?
• Where do we begin?
• How do we do it?

What is a Business Impact Analysis (BIA)?
• A process to understand:
– What is the monetary impact of a disaster or failure?
– What are the most time-critical and information-critical
business processes?
– How does the business REALLY rely upon IT Service and
Application availability?
– What availability or recoverability capabilities are
justifiable based on these requirements, potential impact,
and costs?
• Composed of two components
– Technical Discovery – Data Gathering
– Human Conversation – Talk to People!

Example Output – Priority Tiers
Priority Tier Description
Priority 1
High Availability /
Immediate Recovery
Services whose unavailability more than a brief period can have a severe impact
on customers or time-critical business operations.
Priority 2
1-2 day recovery
Services whose unavailability significantly impacts customers or business
operations.
Priority 3
3-5 day recovery
Services which can tolerate up to five days of disruption in a disaster.
Priority 4
6-10 day recovery
Services which can tolerate up to ten days of disruption in a disaster.
Priority 3 and 4 systems may be restored in less time, depending on the situation.
However, higher priority functions will be restored first.
Priority 5
“Best effort” recovery
Non-critical services which can tolerate two weeks or more of disruption in a
disaster. These systems will be restored on a best-effort basis, after other more
critical systems have been restored and ongoing operations have resumed.
Priority 5 systems may be restored in less time, depending on the situation.
However, higher priority functions will be restored first. In some cases, systems
deemed to not be required for continued operations may not be restored.

What is an SLA?
• A contract between an external service provider and its customers or between
an IT department and the internal business units it serves.
23

What is an SLA?
• Two 9’s – 99% = 3.65 days of downtime per year (easy to achieve, less expensive)
• Three 9’s – 99.9% = 8.76 hours of downtime per year
• Four 9’s – 99.99% = 52.6 minutes of downtime per year
• Five 9’s – 99.999% = 5.26 minutes of downtime per year (difficult to achieve, expensive!)
24

DECLARE
DISASTER
10 a.m.
Recovery Point Objectives
(RPO)
Recovery Time Objectives
(RTO)
RPO: Amount of data lost from
failure, measured as the amount
of time from a disaster event
RTO: Targeted amount of time
to restart a business service
after a disaster event
5
a.m.
6
a.m.
7
a.m.
8
a.m.
9
a.m.
10
a.m.
11
a.m.
12
a.m.
1
p.m.
2
p.m.
3
p.m.
4
p.m.
5
p.m.
6
p.m.
7
p.m.
Disaster Recovery: Key Measures

Cost
Disaster Recovery: Key Measures
Weeks Days Hours Minutes Seconds WeeksDaysHoursMinutesSeconds
Recovery Point Recovery Time
Real Time

BC vs DR vs OR – Say What?
• Business Continuity
– All goes on as normal despite an incident
– Could lose a site and have no impact on business operations (active/active sites)
• Disaster Recovery
– To cope with & recover from an IT crisis that moves work to an alternative system in a non-routine way.
– A real “disaster” is large in scope and impact
– DR typically implies failure of the primary data center and recovery to an alternate site
• Operational Recovery
– Addresses more “routine” types of failures (server, network, storage, etc.)
– Events are smaller in scope and impact than a full disaster
– Typically implies recovering to alternate equipment within the primary data center
• Each should have its own clearly defined objectives – at minimum know the difference.

Where Rubrik Helps
Let’s keep it architecture focused.
28

29
Complexity is the Enemy
Whatever you do. Whatever you buy.
Simplify your Architecture & Expect More

Key Evaluation Criteria
What we’ve seen that makes a difference…
1. Reliability of Data Recovery
a. Simplicity of Setup and Day 2 Operations – SLA Policies!
30

31
Data Management: 1990s to Present
1990s – Present
Backup &
Replication
Software
Backup Storage
Backup
Software
Backup
Servers
Backup
Proxies
Replication Catalog
Database
Tape Off-site ArchiveBackup Storage
a
Dedupe
Metadata
2000s – Present
Data Management: 2000s to Present

33
Meet Rubrik Cloud Data Management
Backup
Software
Backup
Servers
Backup
Proxies
Replication Catalog
Database
Tape Off-site ArchiveBackup Storage
a
Dedupe
Metadata
Private Public
Software fabric for orchestrating apps and data across clouds. No forklift upgrades.

35
How It Works
Quick Start: Rack and go. Auto-discovery.
Rapid Ingest: Flash-optimized, parallel ingest
accelerates snapshots and eliminates stun.
Content-aware dedupe. One global namespace.
Automate: Intelligent SLA policy engine for
effortless management.
Instant Recovery: Live Mount VMs & SQL.
Instant search and file restore.
Secure: End-to-end encryption. Immutability to
fight Ransomware.
Cloud: “CloudOut” instantly accessible with global
search. Launch apps with “CloudOn” for DR or
test/dev. Run apps in cloud.
Primary Environment
SLA Policy Engine
Log Management
Private Public
NAS
AHV Hyper-V
VMware VMwareVMware VMwareVMware VMware

36
Your Data Center Today
Backup Proxy
SAN
Production Servers
Backup Server
Search Server
Disk-Based
Backup
Tape Archive Offsite
Tape Vault

37
Rubrik Simplifies Your Data Center
SAN
Production Servers
Scale Out
Scale Out Rubrik
Replication + Long-Term
Retention + Search
Private

Data Management in the Cloud
38
On-Premises
Applications & Data
Storage
Azure Instance
Blob
Storage
Backup
Replication
Archival
Analytics
Rubrik
Cloud-Native
Applications & Data
EC2 Instance
Rubrik

39
Recovery Point Objective (RPO)
Availability Duration (Retention)
When to Archive (RTO)
Replication Schedule (DR)
{SLA

CONFIDENTIAL
40
Let’s Demo!
What does it look like?

b. Immutability – is your data there there when you need it?
41

Under the Hood
42
“The Interface”
“The Logic”
“The Core”
Distributed Task Framework
Callisto
Distributed Metadata Service
Cluster Management
Global Search
Cerebro
Data Management
Crystal
UI / API
Infinity
Ecosystem
Integration
Thor
Cloud Connect
Atlas
Cloud-Scale File System
NFS

2. Speed of Data Recovery
a. Search + Live Mount
43

CONFIDENTIAL
44
Let’s Demo!
What does it look like?

Rubrik Backup / Recovery + DR
45
SAN
Production Servers
Replication + Long-Term
Retention + Search
DR Servers
Rubrik
Backup S/W + Dedupe Storage
Rubrik
Replication & DR
Private

2. Speed of Data Recovery
a. Search + Live Mount
b. API Usage / Automation to enhance restore capabilities
46

Oh… By the Way
47
Your App
Use an API-first platform to create powerful automation workflows that can
be integrated with any service that supports outbound REST
Now OpenAPI

One More Demo!
Wait a minute…we’ve been doing them already.
48

What did you see?
49
Easy Integration
with vSphere
Working with an
SLA Policy
Real-time Data
Search

51
Don’t Backup. Go Forward.

Andrew Miller | andrew@rubrik.com | @andriven
Rebecca Fitzhugh | rebecca@rubrik.com | @rebeccafitzhugh

MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017

Similar to MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017 (20)

More from Andrew Miller

More from Andrew Miller (8)

Recently uploaded

Recently uploaded (20)

MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017