1. Reference
Architectures:
Architec/ng
Ceph
Storage
Solu/ons
Brent
Compton
Director
Storage
Solu/ons,
Red
Hat
Kyle
Bader
Senior
Solu/on
Architect,
Red
Hat
2. RefArch
Building
Blocks
Servers
and
Media
(HDD,
SSD,
PCIe)
Network
Bare
metal
OpenStack
virt
Container
virt
Other
virt
Defined
Workloads
OS/Virt
PlaMorm
Network
Ceph
4. 1. Qualify
need
for
scale-‐out
storage
2. Design
for
target
workload
IO
profile(s)
3. Choose
storage
access
method(s)
4. Iden/fy
capacity
5. Determine
fault-‐domain
risk
tolerance
6. Select
data
protec/on
method
• Target
server
and
network
hardware
architecture
(performance
and
sizing)
Design
Considera/ons
5. 1.
Qualify
Need
for
Scale-‐out
• Elas/c
provisioning
across
storage
server
cluster
• Data
HA
across
‘islands’
of
scale-‐up
storage
servers
• Standardized
servers
and
networking
• Performance
and
capacity
scaled
independently
• Incremental
v.
forklie
upgrades
8. 2.
Design
for
Workload
IO
• Performance
v.
‘cheap-‐and-‐deep’?
• Performance:
throughput
v.
IOPS
intensive?
• Sequen/al
v.
random?
• Small
block
v.
large
block?
• Read
v.
write
mix?
• Latency:
absolute
v.
consistent
targets?
• Sync
v.
async?
14. 5.
Fault-‐Domain
Risk
Tolerance
• What
%
of
cluster
capacity
does
you
want
on
a
single
node?
When
a
server
fails:
• More
workload
performance
impairment
during
backfill/recovery
with
fewer
nodes
in
the
cluster
(each
node
has
greater
%
of
its
compute/IO
u/liza/on
devoted
to
recovery).
• Larger
%
of
cluster’s
reserve
storage
capacity
u/lized
during
backfill/recovery
with
fewer
nodes
in
the
cluster
(must
reserve
larger
%
of
capacity
for
recovery
with
fewer
nodes).
• Guidelines:
– Minimum
supported
(RHCS):
3
OSD
nodes
per
Ceph
cluster.
– Minimum
recommended
(performance
cluster):
10
OSD
nodes
cluster
(1
node
represents
<10%
of
total
cluster
capacity)
– Minimum
recommended
(cost/capacity
cluster):
7
OSD
nodes
per
cluster
(1
node
represents
<15%
of
total
cluster
capacity)
15. 6.
Data
Protec/on
Schemes
• Replica/on
• Erasure
Coding
(analogous
to
network
RAID)
One
of
the
biggest
choices
affec)ng
TCO
in
the
en)re
solu)on!
16. • Replica/on
– 3x
rep
over
JBOD
=
33%
usable:raw
capacity
ra/o
• Erasure
Coding
(analogous
to
network
RAID)
– 8+3
over
JBOD
=
73%
usable:raw
Data
Protec/on
Schemes
17. • Replica/on
– Ceph
block
storage
default:
3x
rep
over
JBOD
disks.
– Gluster
file
storage
default:
2x
rep
over
RAID6
bricks.
• Erasure
Coding
(analogous
to
network
RAID)
– Data
encoded
into
k
chunks
with
m
parity
chunks
and
spread
onto
different
disks
(frequently
on
different
servers).
Can
tolerate
m
disk
failures
without
data
loss.
8+3
popular.
Data
Protec/on
Schemes
18. Target
Cluster
Hardware
OSP
Starter
100TB
S
500TB
M
1PTB
L
2PB
IOPS
Op/mized
Throughput
Op/mized
Cost-‐Capacity
Op/mized
19. • Following
are
extracts
from
the
recently
published
Ceph
on
Supermicro
RefArch
• Based
on
lab
benchmarking
results
from
many
different
configura/ons
RefArch
Examples
28. • Server
chassis
size
• CPU
• Memory
• Disk
• SSD
Write
Journals
(Ceph
only)
• Network
Add’l
Subsystem
Guidelines
29. • See
Ceph
on
Supermicro
PorMolio
RefArch
hQp://www.redhat.com/en/resources/red-‐hat-‐ceph-‐storage-‐clusters-‐supermicro-‐storage-‐servers
• See
Ceph
on
Cisco
UCS
C3160
Whitepaper
hQp://www.cisco.com/c/en/us/products/collateral/servers-‐unified-‐compu/ng/ucs-‐c-‐series-‐rack-‐servers/whitepaper-‐C11-‐735004.html
• See
Ceph
on
Scalable
Informa/cs
Whitepaper
hQps://www.scalableinforma/cs.com/assets/documents/Unison-‐Ceph-‐Performance.pdf
RefArchs
&
Whitepapers