TOTVS – Todos os direitos reservados
1
Ceph
Desmistificando Software-Defined Storage
Agosto 2017
2
01
O que é SDS?
3
Software-Defined Storage (SDS)
Software-Defined Storage
Commodity Hardware Commodity Hardware Commodity Hardware
Data Layer
ApplicationsUsers
Storage Pool
4
SDS HCI
5
ONTAP Select
Software-Defined Storage Hyper-Converged Infrastructure
6
02
Ceph
7
Ceph - Distributed Storage System
• Multi-Protocolo
• Escalável
• Resiliente
• Auto-Gerenciável
• Tolerante à falha
• Nenhum ponto único de falha
8
Arquitetura
RADOS
A reliable, autonomous, distributed object store comprised of self-healing,
LIBRADOS
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
RBD
A reliable and
fully-distributed
block device, with
a Linux kernel
CEPH FS
A POSIX-compliant
distributed file
system, with a
Linux kernel
RADOSGW
A bucket-based
REST gateway,
APP APP HOST/VM CLIENT
9
Componentes
OSDs
M
Monitors MDS
10
OSDs
• Um por Disco
• Armazenamento
• Replicação
• Gerenciamento
• Recovery
• Rebalanceamento
• Auto-Check
DISK
FS
DISK DISK
OSD
DISK DISK
OSD OSD OSD OSD
FS FS FSFS
btrfs
xfs*
ext4
MMM
11
Monitors
• Mapas
• Cluster
• Monitors
• OSDs
• Placement Groups
• CRUSH
• 3x por cluster
M
12
MDS - Metadata Server
• Usado apenas com CephFS
• Compatível com POSIX
• Gerenciamento de Metadados
• Hierarquia de Diretórios
• Metadados de arquivos
• 3x por cluster
13
03
Distribuição dos
Dados
14
CRUSH
• Pseudo-random
• Rápido
• Deterministico
• Uniforme
• Reduz movimentação
• Baseado em regras
15
Algoritimo
10 10 01 01 10 10 01 11 01
10 10 01 01 10 10 01 11 01 10
CRUSH(pg, cluster state, rule
hash(object name) % num pg
16
Algoritimo
10 10 01 01 10 10 01 11 01
10 10 01 01 10 10 01 11 01 10
CRUSH(pg, cluster state, rule
hash(object name) % num pg
17
Algoritimo
10 10 01 01 10 10 01 11 01
10 10 01 01 10 10 01 11 01 10
CRUSH(pg, cluster state, rule
hash(object name) % num pg
CLIENT
??
Algoritimo
19
Placement Groups
• Mapeamento PGs to OSDs
• Baixo uso de CPU
• Menos cálculos
• Menos metadados
• Balanceamento Dinâmico
20
Placement Groups
• Definido or pool
• Balanceamento por Pool x OSDs
• ~100 PGs por OSD
• + PGs reduz carga por OSD
21
Placement Groups - http://ceph.com/pgcalc/
22
Placement Groups - http://ceph.com/pgcalc/
23
Placement Groups - http://ceph.com/pgcalc/
24
Pools
Replicated Pool
Erasure Coded Pool
Data
Data Data
Data Data
CLIENTCLIENT
D A T A DT AA
• 3x copias (> Overhead)
• Maior durabilidade
• Recuperação mais rápida
• Dado + Paridade (< Overhead)
• Melhor custo benefício
• Maior tempo de recuperação
25
04
Acesso aos Dados
RADOS
A reliable, autonomous, distributed object store comprised of self-healing,
LIBRADOS
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
APP
CEPH FS
A POSIX-compliant
distributed file
system, with a
Linux kernel
CLIENT
26
Arquitetura
RBD
A reliable and
fully-distributed
block device, with
a Linux kernel
RADOSGW
A bucket-based
REST gateway,
APP HOST/VM
RADOS
A reliable, autonomous, distributed object store comprised of self-healing,
LIBRADOS
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
APP
CEPH FS
A POSIX-compliant
distributed file
system, with a
Linux kernel
CLIENT
26
Arquitetura
RBD
A reliable and
fully-distributed
block device, with
a Linux kernel
RADOSGW
A bucket-based
REST gateway,
APP HOST/VM
27
RadosGW - Rados Object Gateway
LIBRADOS
RADOSGW
APP
socket (Autenticação Cephx)
• Interface REST
• Dados
• Gerencimento
• Amazon S3
• OpenStack Swift
• Embedded WebServer
• NFS (experimental)
28
RBD - Rados Block Device
LIBRADOS
socket (Autenticação Cephx)
• Interface de Disco
• Nativo no Kernel Linux
• Thin-provisioned
• Snapshot
• Mirror
• OpenStack
LIBRBD
HOST / VIRTUALIZATION
VMDISK
29
05
OpenStack + Ceph
30
RBD + OpenStack
Fonte: http://ceph.com
Glance CinderNova
COW
clone
COW
clone
COW
clone
Ephemeral Disk
Block Device
31
RBD + OpenStack - Snapshot / Clone
0100 0 0 0
Snapshot Instant Copy
VM-2 VM-3 VM-4 VM-5
0
VM…
VM-1
31
RBD + OpenStack - Snapshot / Clone
0100 0 0 0
Snapshot Instant Copy
VM-2 VM-3 VM-4 VM-5
0
VM…
VM-1
= 100
31
RBD + OpenStack - Snapshot / Clone
0100 0 0 0
Snapshot Instant Copy
VM-2 VM-3 VM-4 VM-5
0
VM…
VM-1
= 100 5100
VM-2VM-1
CLIENT
write
write
write
write
write
31
RBD + OpenStack - Snapshot / Clone
0100 0 0 0
Snapshot Instant Copy
VM-2 VM-3 VM-4 VM-5
0
VM…
VM-1
= 100 5100
VM-2VM-1
CLIENT
= 105
write
write
write
write
write
32
RBD + OpenStack
33
Dúvidas
totvs.com
@totvs
blog.totvs.comcompany/totvs
fluig.com
34
Obrigado
Italo Santos
Cloud - Infraestrutura Storage
italo.ssantos@totvs.com.br
#SOMOSTOTVERS

Ceph - Desmistificando Software-Define Storage