2012 open storage summit keynote

Cloud Storage Futures (previously: Designing Private & Public Clouds)

May 22nd, 2012

Randy Bias, CTO & Co-founder

@randybias

CCA - NoDerivs 3.0 Unported License - Usage OK, no modiﬁcations, full attribution*
* All unlicensed or borrowed works retain their original licenses

Part 1:

The Two Cloud
Architectures

2

A Story of Two Clouds

Scale-out

Enterprise

3

... Driven by Two App Types

New Elastic Apps

Existing Apps

4

Cloud Computing ... Disrupts

Mainframe Enterprise Cloud
Computing Computing Computing

"Big Iron" "Client-Server" "Web"

1960 1980 2000 2020

5




1960 1980 2000 2020

Disruption
5




1960 1980 2000 2020

Disruption Disruption
5




1960 1980 2000 2020

Disruption Disruption Disruption
5

IT – Evolution of Computing Models

SLA

Scaling

Hardware

HA Type

Software

Consumption

“big-iron” “client/server” “scale-out”

6


SLA 99.999

Scaling Vertical

Hardware Custom

HA Type Hardware

Software Centralized

Consumption Centralized
Service


6


SLA 99.999 99.9

Scaling Vertical Horizontal

Hardware Custom Enterprise

HA Type Hardware Software

Software Centralized Decentralized

Consumption Centralized Shared
Service Service


6


SLA 99.999 99.9 Always On

Scaling Vertical Horizontal

Hardware Custom Enterprise Commodity

HA Type Hardware Software

Software Centralized Decentralized Distributed

Consumption Centralized Shared Self-service
Service Service


6

Enterprise Computing
(existing apps built in silos)

7

Cloud Computing
(new elastic apps)

8

Scale-out apps require elastic infrastructure

Traditional apps Elastic cloud-ready apps

APPS

INFRA

9

Scale-out Cloud Technology

Traditional apps Elastic cloud-ready apps

APPS

INFRA

10

Scale-out Principles

• Small failure domains
• Risk acceptance vs. risk mitigation
• More boxes for throughput & redundancy
• Assume app manages complexity:
• Data replication
• Assumes infrastructure is unreliable:
• Server & data redundancy
• Geo-distribution
• Auto-scaling

11

What’s a failure domain?

• “Blast radius” during a failure
• What is impacted?
• Public SAN failures:
• FlexiScale SAN failure in 2007
• UOL Brazil in 2011:
• http://goo.gl/8ct9n
• There are many more
• Enterprise HA ‘pairs’ typically support
BIG failure domains

12

Two Diff Arches for Two Kinds of Apps

App

Elastic Scale-out
Location of complexity

Cloud

“Enterprise”
Cloud

Infra
Primary scaling dimension
Up Out

13

Part 2:

Storage Architectures
Are Changing

14

Two Diff Storages for Two Kinds of Clouds

“Classic” “Scale-out”
Storage Storage

Uptime in Infra Uptime in apps
Every part is redundant Minimal h/w redundancy
Data mgmt in Infra Data mgmt in apps
Bigger SAN/NAS/DFS Smaller failure domains

15

Difference in Tiers

Tier $ Purpose Classic Scale-out

Mission
•SAN, then NAS •On-demand SAN (EBS)
1 $$$$
Critical
•10-15K RPM •DynamoDB (AWS)
•SSD •Variable service levels

•NAS then SAN •DAS
2 $$ Important
•7.2K RPM •App / DFS to scale out

Archive & •Tape
3 $
Backups •Nearline 5.4K •Object Storage

16

The Biggest Difference is in Where Data
Management Resides

• In scale-out systems, apps are
managing the data:

• Riak / Scale-out distributed data
store
• Hadoop+HDFS / Scale-out
distributed computation systems
• Cassandra / Scale-out distributed
columnar database

17

Cassandra / Netﬂix use case

• 3 x Replication
• Linearly scaling performance
• 50 - 300 nodes
• > 1M writes/second

• When is this perfect?
• data size unknown
• growth unknown
• lots of elastic dynamism

18


• DAS (‘ephemeral store’)
• Per node perf is constant
• disk
• CPU
• network
• Client write times constant
• Nothing special here

19


• On-demand & app-managed
• Cost per GB/hr: $.006
• Cost per GB/mo: $4.14
• Includes: storage, DB, storage
admin, network, network
admin, etc. etc. etc.

20

Part 3:

Scale-out Storage ...
Now & Future

21

Only Change is Certain

22

There are a few basic approaches being
taken ...

23

Scale-out SAN

• In-rack SAN == faster, bigger
DAS w/ better stat-muxing
• Accept normal DAS failure rates
• Assume app handles data
replication Dedicated Storage SW

• Like AWS ‘ephemeral storage’ 9K Jumbo Frames
SSD caches (ZIL/L2ARC)
• KT architecture No replication
• Customers didn’t “get it” Max HW redundancy

• “Ephemeral SAN” not well
understood

24

AWS EBS - “Block-devices-as-a-Service”

API

Cloud Control System
EBS
Scheduler
• Scale-out SAN (sort of)
• Block scheduler EBS
Clusters
• Async replication Core Network
1 1
• Some failure tolerance 2

• Scheduler: 2 3

• Allocates customer block 4
Inter-rack
cluster async
devices across many VM1 N1 1 3
3 5 replication
failure domains
RACK

RACK
6

• Customer run RAID inside 7
Intra-rack
cluster async
VMs to increase VM2 N2 2 8
replication

redundancy

25

DAS + Big Data
(Storage + Compute + DFS)

• Storage capability:
• Replication
• Disk & server failure
• Data rebalancing
• Data locality
• rack awareness
• Checksums (basic)
• Also:
• Built in computation

26

Distributed File Systems (DFS) over DAS

• Replication
• Checksums (w/ btrfs)
• Block devices
• Also: ✴
ceph architecture
• No computation

✴ Looks familiar doesn’t it? 27

Why is DFS at the Physical Layer
Dangerous for Scale-out?

DFS

==

28

DAS + Database Replication / Scaling

• Async/Sync Replication
• Server failure
• Checksums (sort of)
• Also:
• Std RDBMS
• SQL i/f
• Well understood

29

Object Storage

• Replication
• Checksums (sometimes)
• Also:
• Looks like a big web app
• Uses a DHT/CHT to ‘index’ blobs
• Very simple

30

Where does OpenStorage Fit?

Scale-out Virtual or
Purpose / Tier Fit
Solution Physical?
Scale-out SAN Tier-1/2 Physical In-rack SAN

EBS Clusters
EBS Tier-1 Physical
(scale-out SAN)
Reliable, bit-rot
DAS+BigData Tier-2 Virtual
resistant DAS
Reliable, bit-rot
DAS+DFS Tier-2 Physical / Virtual resistant DAS
(unproven)
In-VM reliable
DAS+DB Tier-2 Virtual
DAS
Reliable, bit-rot
Object Storage Tier-3 Physical
resistant DAS

31

Summarizing ZFS Value in Scale-out

• Data integrity & bit rot an issue that few solve today
• Most SAN/NAS solutions don’t ‘scale down’
• Commodity x86 servers are winning
• There are two scale-out places ZFS wins:

• Small SAN clusters
• Best DAS management

32

Conclusions / Speculations

• Build the right cloud
• Which means the right storage for *that* cloud
• A single cloud might support both ...
• Open storage can be used for both ...
• ... WITH the appropriate design/forethought

34

2012 open storage summit keynote

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (6)

Similar to 2012 open storage summit keynote

Similar to 2012 open storage summit keynote (20)

More from Randy Bias

More from Randy Bias (20)

Recently uploaded

Recently uploaded (20)

2012 open storage summit keynote