SlideShare a Scribd company logo
1 of 40
Download to read offline
WHAT’S NEW IN LUMINOUS AND BEYOND
SAGE WEIL – RED HAT
2017.11.06
2
UPSTREAM RELEASES
Jewel (LTS)
Spring 2016
Kraken
Fall 2016
Luminous
Summer 2017
12.2.z
10.2.z
14.2.z
WE ARE HERE
Mimic
Spring 2018
Nautilus
Winter 2019
13.2.z
● New release cadence
● Named release every 9 months
● Backports for 2 releases
● Upgrade up to 2 releases at a time
(e.g., Luminous → Nautilus)
3 LUMINOUS
4
BLUESTORE: STABLE AND DEFAULT
●
New OSD backend
– consumes raw block device(s) – no more XFS
– embeds rocksdb for metadata
●
Fast on both HDDs (~2x) and SSDs (~1.5x)
– Similar to FileStore on NVMe, where the device is not the bottleneck
●
Smaller journals
– happily uses fast SSD partition(s) for internal metadata, or NVRAM for journal
●
Full data checksums (crc32c, xxhash, etc.)
●
Inline compression (zlib, snappy)
– policy driven by global or per-pool config, and/or client hints
●
Stable and default
5
HDD: RANDOM WRITE
4 8 16 32 64
128
256
512
1024
2048
4096
0
50
100
150
200
250
300
350
400
450
Bluestore vs Filestore HDD Random Write Throughput
Filestore
Bluestore (wip-bitmap-alloc-perf)
BS (Master-a07452d)
BS (Master-d62a4948)
Bluestore (wip-bluestore-dw)
IO Size
Throughput(MB/s)
4 8 16 32 64
128
256
512
1024
2048
4096
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Bluestore vs Filestore HDD Random Write IOPS
Filestore
Bluestore (wip-bitmap-alloc-perf)
BS (Master-a07452d)
BS (Master-d62a4948)
Bluestore (wip-bluestore-dw)
IO Size
IOPS
6
HDD: MIXED READ/WRITE
4 8 16 32 64
128
256
512
1024
2048
4096
0
50
100
150
200
250
300
350
Bluestore vs Filestore HDD Random RW Throughput
Filestore
Bluestore (wip-bitmap-alloc-perf)
BS (Master-a07452d)
BS (Master-d62a4948)
Bluestore (wip-bluestore-dw)
IO Size
Throughput(MB/s)
4 8 16 32 64
128
256
512
1024
2048
4096
0
200
400
600
800
1000
1200
Bluestore vs Filestore HDD Random RW IOPS
Filestore
Bluestore (wip-bitmap-alloc-perf)
BS (Master-a07452d)
BS (Master-d62a4948)
Bluestore (wip-bluestore-dw)
IO Size
IOPS
7
RGW ON HDD+NVME, EC 4+2
1 Bucket 128 Buckets 4 Buckets 512 Buckets Rados
1 RGW Server 4 RGW Servers Bench
0
200
400
600
800
1000
1200
1400
1600
1800
4+2 Erasure Coding RadosGW Write Tests
32MB Objects, 24 HDD/NVMe OSDs on 4 Servers, 4 Clients
Filestore 512KB Chunks
Filestore 4MB Chunks
Bluestore 512KB Chunks
Bluestore 4MB Chunks
Throughput(MB/s)
8
RBD OVER ERASURE CODED POOLS
●
aka erasure code overwrites
●
requires BlueStore to perform reasonably
●
significant improvement in efficiency over 3x replication
– 2+2 → 2x 4+2 → 1.5x
●
small writes slower than replication
– early testing showed 4+2 is about half as fast as 3x replication
●
large writes faster than replication
– less IO to device
●
implementation still does the “simple” thing
– all writes update a full stripe
9
CEPH-MGR
●
ceph-mgr
– new management daemon to supplement ceph-mon (monitor)
– easier integration point for python management logic
– integrated metrics
●
ceph-mon scaling
– offload pg stats from mon to mgr
– validated 10K OSD deployment (“Big Bang III” @ CERN)
●
restful: new REST API
●
prometheus, influx, zabbix
●
dashboard: built-in web dashboard
– webby equivalent of 'ceph -s'
M G
???
(time for new iconography)
10
11
CEPH-METRICS
12
ASYNCMESSENGER
●
new network Messenger implementation
– event driven
– fixed-size thread pool
●
RDMA backend (ibverbs)
– built by default
– limited testing, but seems stable!
●
DPDK backend
– prototype!
13
PERFECTLY BALANCED OSDS (FINALLY!)
●
CRUSH choose_args
– alternate weight sets for individual rules
– fixes two problems
●
imbalance – run numeric optimization to adjust weights to balance PG
distribution for a pool (or cluster)
●
multipick anomaly – adjust weights per position to correct for low-
weighted devices (e.g., mostly empty rack)
– backward compatible ‘compat weight-set’ for imbalance only
●
pg up-map
– explicitly map individual PGs to specific devices in OSDMap
– requires luminous+ clients
●
Automagic, courtesy of ceph-mgr ‘balancer’ module
– ceph balancer mode crush-compat
– ceph balancer on
14
RADOS MISC
●
CRUSH device classes
– mark OSDs with class (hdd, ssd, etc)
– out-of-box rules to map to specific class of devices within the
same hierarchy
●
streamlined disk replacement
●
require_min_compat_client – simpler, safer configuration
●
annotated/documented config options
●
client backoff on stuck PGs or objects
●
better EIO handling
●
peering and recovery speedups
●
fast OSD failure detection
15
S3
Swift
Erasure coding
Multisite federation
Multisite replication
NFS
Encryption
Tiering
Deduplication
RADOSGW
Compression
16
ZONE CZONE B
RGW METADATA SEARCH
RADOSGW
LIBRADOS
M
CLUSTER A
MM M
CLUSTER B
MM
RADOSGW RADOSGW RADOSGW
LIBRADOS LIBRADOS LIBRADOS
REST
RADOSGW
LIBRADOS
M
CLUSTER C
MM
REST
ZONE A
17
RGW MISC
●
NFS gateway
– NFSv4 and v3
– full object access (not general purpose!)
●
dynamic bucket index sharding
– automatic (finally!)
●
inline compression
●
encryption
– follows S3 encryption APIs
●
S3 and Swift API odds and ends
RADOSGW
LIBRADOS
18
Erasure coding
Multisite mirroring
Persistent client cache
Consistency groups
Encryption
iSCSI
Trash
RBD
19
RBD
●
RBD over erasure coded pool
– rbd create --data-pool <ecpoolname> ...
●
RBD mirroring improvements
– cooperative HA daemons
– improved Cinder integration
●
iSCSI
– LIO tcmu-runner, librbd (full feature set)
●
Kernel RBD improvements
– exclusive locking, object map
20
CEPHFS
●
multiple active MDS daemons (finally!)
●
subtree pinning to specific daemon
●
directory fragmentation on by default
– (snapshots still off by default)
●
so many tests
●
so many bugs fixed
●
kernel client improvements
21 MIMIC
22
OSD refactor
Tiering
Client-side caches
Metrics
Dedup
QoS
Self-management
Multi-site federation
23
CONTAINERS CONTAINERS
●
ceph-container (https://github.com/ceph/ceph-container)
– multi-purpose container image
– plan to publish upstream releases as containers on download.ceph.com
●
ceph-helm (https://github.com/ceph/ceph-helm)
– Helm charts to deploy Ceph in kubernetes
●
openshift-ansible
– deployment of Ceph in OpenShift
●
mgr-based dashboard/GUI
– management and metrics (based on ceph-metrics, https://github.com/ceph/ceph-metrics)
●
leverage kubernetes for daemon scheduling
– MDS, RGW, mgr, NFS, iSCSI, CIFS, rbd-mirror, …
●
streamlined provisioning and management
– adding nodes, replacing failed disks, …
24
RADOS
●
The future is
– NVMe
– DPDK/SPDK
– futures-based programming model for OSD
– painful but necessary
●
BlueStore and rocksdb optimization
– rocksdb level0 compaction
– alternative KV store?
●
RedStore?
– considering NVMe focused backend beyond BlueStore
●
Erasure coding plugin API improvements
– new codes with less IO for single-OSD failures
DEVICE
OSD
OBJECTSTORE
MESSENGER
25
EASE OF USE ODDS AND ENDS
●
centralized config management
– all config on mons – no more ceph.conf
– ceph config …
●
PG autoscaling
– less reliance on operator to pick right
values
●
progress bars
– recovery, rebalancing, etc.
●
async recovery
– fewer high-latency outliers during recovery
26
QUALITY OF SERVICE
●
Ongoing background development
– dmclock distributed QoS queuing
– minimum reservations and priority weighting
●
Range of policies
– IO type (background, client)
– pool-based
– client-based
●
Theory is complex
●
Prototype is promising, despite simplicity
●
Basics will land in Mimic
27
28
CEPH STORAGE CLUSTER
TIERING
●
new RADOS ‘redirect’ primitive
– basically a symlink, transparent to librados
– replace “sparse” cache tier with base pool “index”
APPLICATION
BASE POOL (REPLICATED, SSD)
APPLICATION
CACHE POOL (REPLICATED)
BASE POOL (HDD AND/OR ERASURE)
CEPH STORAGE CLUSTER
SLOW 1 (EC) SLOW #1 (...)
29
CEPH STORAGE CLUSTER
DEDUPLICATION WIP
●
Generalize redirect to a “manifest”
– map of offsets to object “fragments” (vs a full
object copy)
●
Break objects into chunks
– fixed size, or content fingerprint
●
Store chunks in content-addressable pool
– name object by sha256(content)
– reference count chunks
●
TBD
– inline or post?
– policy (when to promote inline, etc.)
– agent home (embed in OSD, …)
APPLICATION
BASE POOL (REPLICATED, SSD)
SLOW 1 (EC) CAS DEDUP POOL
30
CephFS
●
Integrated NFS gateway management
– HA nfs-ganesha service
– API configurable, integrated with Manila
●
snapshots!
●
quota for kernel client
31
CLIENT CACHES!
●
RGW
– persistent read-only cache on NVMe
– fully consistent (only caches immutable “tail” rados objects)
– Mass Open Cloud
●
RBD
– persistent read-only cache of immutable clone parent images
– writeback cache for improving write latency
●
cluster image remains crash-consistent if client cache is lost
●
CephFS
– kernel client already uses kernel fscache facility
32
GROWING DEVELOPER COMMUNITY
33
●
Red Hat
●
Mirantis
●
SUSE
●
ZTE
●
China Mobile
●
XSky
●
Digiware
●
Intel
●
Kylin Cloud
●
Easystack
●
Istuary Innovation Group
●
Quantum
●
Mellanox
●
H3C
●
Quantum
●
UnitedStack
●
Deutsche Telekom
●
Reliance Jio Infocomm
●
OVH
●
Alibaba
●
DreamHost
●
CERN
GROWING DEVELOPER COMMUNITY
34
●
Red Hat
●
Mirantis
●
SUSE
●
ZTE
●
China Mobile
●
XSky
●
Digiware
●
Intel
●
Kylin Cloud
●
Easystack
●
Istuary Innovation Group
●
Quantum
●
Mellanox
●
H3C
●
Quantum
●
UnitedStack
●
Deutsche Telekom
●
Reliance Jio Infocomm
●
OVH
●
Alibaba
●
DreamHost
●
CERN
OPENSTACK VENDORS
35
●
Red Hat
●
Mirantis
●
SUSE
●
ZTE
●
China Mobile
●
XSky
●
Digiware
●
Intel
●
Kylin Cloud
●
Easystack
●
Istuary Innovation Group
●
Quantum
●
Mellanox
●
H3C
●
Quantum
●
UnitedStack
●
Deutsche Telekom
●
Reliance Jio Infocomm
●
OVH
●
Alibaba
●
DreamHost
●
CERN
CLOUD OPERATORS
36
●
Red Hat
●
Mirantis
●
SUSE
●
ZTE
●
China Mobile
●
XSky
●
Digiware
●
Intel
●
Kylin Cloud
●
Easystack
●
Istuary Innovation Group
●
Quantum
●
Mellanox
●
H3C
●
Quantum
●
UnitedStack
●
Deutsche Telekom
●
Reliance Jio Infocomm
●
OVH
●
Alibaba
●
DreamHost
●
CERN
HARDWARE AND SOLUTION VENDORS
37
●
Red Hat
●
Mirantis
●
SUSE
●
ZTE
●
China Mobile
●
XSky
●
Digiware
●
Intel
●
Kylin Cloud
●
Easystack
●
Istuary Innovation Group
●
Quantum
●
Mellanox
●
H3C
●
Quantum
●
UnitedStack
●
Deutsche Telekom
●
Reliance Jio Infocomm
●
OVH
●
Alibaba
●
DreamHost
●
CERN
???
38
●
Mailing list and IRC
– http://ceph.com/IRC
●
Ceph Developer Monthly
– first Weds of every month
– video conference (Bluejeans)
– alternating APAC- and EMEA-
friendly times
●
Github
– https://github.com/ceph/
●
Ceph Days
– http://ceph.com/cephdays/
●
Meetups
– http://ceph.com/meetups
●
Ceph Tech Talks
– http://ceph.com/ceph-tech-talks/
●
‘Ceph’ Youtube channel
– (google it)
●
Twitter
– @ceph
GET INVOLVED
CEPHALOCON APAC
2018.03.22 and 23
BEIJING, CHINA
THANK YOU!
Sage Weil
CEPH ARCHITECT
sage@redhat.com
@liewegas

More Related Content

What's hot

Keeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containersKeeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containersSage Weil
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSHSage Weil
 
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackCeph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackRed_Hat_Storage
 
Ceph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to JewelCeph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to JewelColleen Corrice
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
 
XSKY - ceph luminous update
XSKY - ceph luminous updateXSKY - ceph luminous update
XSKY - ceph luminous updateinwin stack
 
Ceph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldCeph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldSage Weil
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage systemItalo Santos
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016John Spray
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turkbuildacloud
 
20171101 taco scargo luminous is out, what's in it for you
20171101 taco scargo   luminous is out, what's in it for you20171101 taco scargo   luminous is out, what's in it for you
20171101 taco scargo luminous is out, what's in it for youTaco Scargo
 
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)Ontico
 
Cephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkCephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkXiaoxi Chen
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to CephCeph Community
 
Hadoop over rgw
Hadoop over rgwHadoop over rgw
Hadoop over rgwzhouyuan
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 

What's hot (20)

Keeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containersKeeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containers
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackCeph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
 
Bluestore
BluestoreBluestore
Bluestore
 
Ceph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to JewelCeph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to Jewel
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
XSKY - ceph luminous update
XSKY - ceph luminous updateXSKY - ceph luminous update
XSKY - ceph luminous update
 
Ceph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldCeph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud world
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
 
Block Storage For VMs With Ceph
Block Storage For VMs With CephBlock Storage For VMs With Ceph
Block Storage For VMs With Ceph
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
 
20171101 taco scargo luminous is out, what's in it for you
20171101 taco scargo   luminous is out, what's in it for you20171101 taco scargo   luminous is out, what's in it for you
20171101 taco scargo luminous is out, what's in it for you
 
librados
libradoslibrados
librados
 
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
 
Cephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkCephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmark
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph
 
Hadoop over rgw
Hadoop over rgwHadoop over rgw
Hadoop over rgw
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 

Similar to What's new in Luminous and Beyond

CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH Ceph Community
 
The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.
The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.
The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.The Hive
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific DashboardCeph Community
 
Ceph Tech Talk: Bluestore
Ceph Tech Talk: BluestoreCeph Tech Talk: Bluestore
Ceph Tech Talk: BluestoreCeph Community
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCeph Community
 
RBD: What will the future bring? - Jason Dillaman
RBD: What will the future bring? - Jason DillamanRBD: What will the future bring? - Jason Dillaman
RBD: What will the future bring? - Jason DillamanCeph Community
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyCeph Community
 
Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Community
 
Open Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETOpen Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETNikos Kormpakis
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterEttore Simone
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster CassandraTzach Livyatan
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
INFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage ArhcitectureINFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage ArhcitectureThomas Uhl
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterPatrick Quairoli
 
Ceph Day New York 2014: Future of CephFS
Ceph Day New York 2014:  Future of CephFS Ceph Day New York 2014:  Future of CephFS
Ceph Day New York 2014: Future of CephFS Ceph Community
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed_Hat_Storage
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 

Similar to What's new in Luminous and Beyond (20)

CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
 
The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.
The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.
The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
Ceph Tech Talk: Bluestore
Ceph Tech Talk: BluestoreCeph Tech Talk: Bluestore
Ceph Tech Talk: Bluestore
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at Last
 
RBD: What will the future bring? - Jason Dillaman
RBD: What will the future bring? - Jason DillamanRBD: What will the future bring? - Jason Dillaman
RBD: What will the future bring? - Jason Dillaman
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
 
Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg
 
Open Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETOpen Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNET
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data Center
 
Scale 10x 01:22:12
Scale 10x 01:22:12Scale 10x 01:22:12
Scale 10x 01:22:12
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
INFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage ArhcitectureINFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage Cluster
 
Ceph Day New York 2014: Future of CephFS
Ceph Day New York 2014:  Future of CephFS Ceph Day New York 2014:  Future of CephFS
Ceph Day New York 2014: Future of CephFS
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Strata - 03/31/2012
Strata - 03/31/2012Strata - 03/31/2012
Strata - 03/31/2012
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 

Recently uploaded

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 

Recently uploaded (20)

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 

What's new in Luminous and Beyond

  • 1. WHAT’S NEW IN LUMINOUS AND BEYOND SAGE WEIL – RED HAT 2017.11.06
  • 2. 2 UPSTREAM RELEASES Jewel (LTS) Spring 2016 Kraken Fall 2016 Luminous Summer 2017 12.2.z 10.2.z 14.2.z WE ARE HERE Mimic Spring 2018 Nautilus Winter 2019 13.2.z ● New release cadence ● Named release every 9 months ● Backports for 2 releases ● Upgrade up to 2 releases at a time (e.g., Luminous → Nautilus)
  • 4. 4 BLUESTORE: STABLE AND DEFAULT ● New OSD backend – consumes raw block device(s) – no more XFS – embeds rocksdb for metadata ● Fast on both HDDs (~2x) and SSDs (~1.5x) – Similar to FileStore on NVMe, where the device is not the bottleneck ● Smaller journals – happily uses fast SSD partition(s) for internal metadata, or NVRAM for journal ● Full data checksums (crc32c, xxhash, etc.) ● Inline compression (zlib, snappy) – policy driven by global or per-pool config, and/or client hints ● Stable and default
  • 5. 5 HDD: RANDOM WRITE 4 8 16 32 64 128 256 512 1024 2048 4096 0 50 100 150 200 250 300 350 400 450 Bluestore vs Filestore HDD Random Write Throughput Filestore Bluestore (wip-bitmap-alloc-perf) BS (Master-a07452d) BS (Master-d62a4948) Bluestore (wip-bluestore-dw) IO Size Throughput(MB/s) 4 8 16 32 64 128 256 512 1024 2048 4096 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Bluestore vs Filestore HDD Random Write IOPS Filestore Bluestore (wip-bitmap-alloc-perf) BS (Master-a07452d) BS (Master-d62a4948) Bluestore (wip-bluestore-dw) IO Size IOPS
  • 6. 6 HDD: MIXED READ/WRITE 4 8 16 32 64 128 256 512 1024 2048 4096 0 50 100 150 200 250 300 350 Bluestore vs Filestore HDD Random RW Throughput Filestore Bluestore (wip-bitmap-alloc-perf) BS (Master-a07452d) BS (Master-d62a4948) Bluestore (wip-bluestore-dw) IO Size Throughput(MB/s) 4 8 16 32 64 128 256 512 1024 2048 4096 0 200 400 600 800 1000 1200 Bluestore vs Filestore HDD Random RW IOPS Filestore Bluestore (wip-bitmap-alloc-perf) BS (Master-a07452d) BS (Master-d62a4948) Bluestore (wip-bluestore-dw) IO Size IOPS
  • 7. 7 RGW ON HDD+NVME, EC 4+2 1 Bucket 128 Buckets 4 Buckets 512 Buckets Rados 1 RGW Server 4 RGW Servers Bench 0 200 400 600 800 1000 1200 1400 1600 1800 4+2 Erasure Coding RadosGW Write Tests 32MB Objects, 24 HDD/NVMe OSDs on 4 Servers, 4 Clients Filestore 512KB Chunks Filestore 4MB Chunks Bluestore 512KB Chunks Bluestore 4MB Chunks Throughput(MB/s)
  • 8. 8 RBD OVER ERASURE CODED POOLS ● aka erasure code overwrites ● requires BlueStore to perform reasonably ● significant improvement in efficiency over 3x replication – 2+2 → 2x 4+2 → 1.5x ● small writes slower than replication – early testing showed 4+2 is about half as fast as 3x replication ● large writes faster than replication – less IO to device ● implementation still does the “simple” thing – all writes update a full stripe
  • 9. 9 CEPH-MGR ● ceph-mgr – new management daemon to supplement ceph-mon (monitor) – easier integration point for python management logic – integrated metrics ● ceph-mon scaling – offload pg stats from mon to mgr – validated 10K OSD deployment (“Big Bang III” @ CERN) ● restful: new REST API ● prometheus, influx, zabbix ● dashboard: built-in web dashboard – webby equivalent of 'ceph -s' M G ??? (time for new iconography)
  • 10. 10
  • 12. 12 ASYNCMESSENGER ● new network Messenger implementation – event driven – fixed-size thread pool ● RDMA backend (ibverbs) – built by default – limited testing, but seems stable! ● DPDK backend – prototype!
  • 13. 13 PERFECTLY BALANCED OSDS (FINALLY!) ● CRUSH choose_args – alternate weight sets for individual rules – fixes two problems ● imbalance – run numeric optimization to adjust weights to balance PG distribution for a pool (or cluster) ● multipick anomaly – adjust weights per position to correct for low- weighted devices (e.g., mostly empty rack) – backward compatible ‘compat weight-set’ for imbalance only ● pg up-map – explicitly map individual PGs to specific devices in OSDMap – requires luminous+ clients ● Automagic, courtesy of ceph-mgr ‘balancer’ module – ceph balancer mode crush-compat – ceph balancer on
  • 14. 14 RADOS MISC ● CRUSH device classes – mark OSDs with class (hdd, ssd, etc) – out-of-box rules to map to specific class of devices within the same hierarchy ● streamlined disk replacement ● require_min_compat_client – simpler, safer configuration ● annotated/documented config options ● client backoff on stuck PGs or objects ● better EIO handling ● peering and recovery speedups ● fast OSD failure detection
  • 15. 15 S3 Swift Erasure coding Multisite federation Multisite replication NFS Encryption Tiering Deduplication RADOSGW Compression
  • 16. 16 ZONE CZONE B RGW METADATA SEARCH RADOSGW LIBRADOS M CLUSTER A MM M CLUSTER B MM RADOSGW RADOSGW RADOSGW LIBRADOS LIBRADOS LIBRADOS REST RADOSGW LIBRADOS M CLUSTER C MM REST ZONE A
  • 17. 17 RGW MISC ● NFS gateway – NFSv4 and v3 – full object access (not general purpose!) ● dynamic bucket index sharding – automatic (finally!) ● inline compression ● encryption – follows S3 encryption APIs ● S3 and Swift API odds and ends RADOSGW LIBRADOS
  • 18. 18 Erasure coding Multisite mirroring Persistent client cache Consistency groups Encryption iSCSI Trash RBD
  • 19. 19 RBD ● RBD over erasure coded pool – rbd create --data-pool <ecpoolname> ... ● RBD mirroring improvements – cooperative HA daemons – improved Cinder integration ● iSCSI – LIO tcmu-runner, librbd (full feature set) ● Kernel RBD improvements – exclusive locking, object map
  • 20. 20 CEPHFS ● multiple active MDS daemons (finally!) ● subtree pinning to specific daemon ● directory fragmentation on by default – (snapshots still off by default) ● so many tests ● so many bugs fixed ● kernel client improvements
  • 23. 23 CONTAINERS CONTAINERS ● ceph-container (https://github.com/ceph/ceph-container) – multi-purpose container image – plan to publish upstream releases as containers on download.ceph.com ● ceph-helm (https://github.com/ceph/ceph-helm) – Helm charts to deploy Ceph in kubernetes ● openshift-ansible – deployment of Ceph in OpenShift ● mgr-based dashboard/GUI – management and metrics (based on ceph-metrics, https://github.com/ceph/ceph-metrics) ● leverage kubernetes for daemon scheduling – MDS, RGW, mgr, NFS, iSCSI, CIFS, rbd-mirror, … ● streamlined provisioning and management – adding nodes, replacing failed disks, …
  • 24. 24 RADOS ● The future is – NVMe – DPDK/SPDK – futures-based programming model for OSD – painful but necessary ● BlueStore and rocksdb optimization – rocksdb level0 compaction – alternative KV store? ● RedStore? – considering NVMe focused backend beyond BlueStore ● Erasure coding plugin API improvements – new codes with less IO for single-OSD failures DEVICE OSD OBJECTSTORE MESSENGER
  • 25. 25 EASE OF USE ODDS AND ENDS ● centralized config management – all config on mons – no more ceph.conf – ceph config … ● PG autoscaling – less reliance on operator to pick right values ● progress bars – recovery, rebalancing, etc. ● async recovery – fewer high-latency outliers during recovery
  • 26. 26 QUALITY OF SERVICE ● Ongoing background development – dmclock distributed QoS queuing – minimum reservations and priority weighting ● Range of policies – IO type (background, client) – pool-based – client-based ● Theory is complex ● Prototype is promising, despite simplicity ● Basics will land in Mimic
  • 27. 27
  • 28. 28 CEPH STORAGE CLUSTER TIERING ● new RADOS ‘redirect’ primitive – basically a symlink, transparent to librados – replace “sparse” cache tier with base pool “index” APPLICATION BASE POOL (REPLICATED, SSD) APPLICATION CACHE POOL (REPLICATED) BASE POOL (HDD AND/OR ERASURE) CEPH STORAGE CLUSTER SLOW 1 (EC) SLOW #1 (...)
  • 29. 29 CEPH STORAGE CLUSTER DEDUPLICATION WIP ● Generalize redirect to a “manifest” – map of offsets to object “fragments” (vs a full object copy) ● Break objects into chunks – fixed size, or content fingerprint ● Store chunks in content-addressable pool – name object by sha256(content) – reference count chunks ● TBD – inline or post? – policy (when to promote inline, etc.) – agent home (embed in OSD, …) APPLICATION BASE POOL (REPLICATED, SSD) SLOW 1 (EC) CAS DEDUP POOL
  • 30. 30 CephFS ● Integrated NFS gateway management – HA nfs-ganesha service – API configurable, integrated with Manila ● snapshots! ● quota for kernel client
  • 31. 31 CLIENT CACHES! ● RGW – persistent read-only cache on NVMe – fully consistent (only caches immutable “tail” rados objects) – Mass Open Cloud ● RBD – persistent read-only cache of immutable clone parent images – writeback cache for improving write latency ● cluster image remains crash-consistent if client cache is lost ● CephFS – kernel client already uses kernel fscache facility
  • 33. 33 ● Red Hat ● Mirantis ● SUSE ● ZTE ● China Mobile ● XSky ● Digiware ● Intel ● Kylin Cloud ● Easystack ● Istuary Innovation Group ● Quantum ● Mellanox ● H3C ● Quantum ● UnitedStack ● Deutsche Telekom ● Reliance Jio Infocomm ● OVH ● Alibaba ● DreamHost ● CERN GROWING DEVELOPER COMMUNITY
  • 34. 34 ● Red Hat ● Mirantis ● SUSE ● ZTE ● China Mobile ● XSky ● Digiware ● Intel ● Kylin Cloud ● Easystack ● Istuary Innovation Group ● Quantum ● Mellanox ● H3C ● Quantum ● UnitedStack ● Deutsche Telekom ● Reliance Jio Infocomm ● OVH ● Alibaba ● DreamHost ● CERN OPENSTACK VENDORS
  • 35. 35 ● Red Hat ● Mirantis ● SUSE ● ZTE ● China Mobile ● XSky ● Digiware ● Intel ● Kylin Cloud ● Easystack ● Istuary Innovation Group ● Quantum ● Mellanox ● H3C ● Quantum ● UnitedStack ● Deutsche Telekom ● Reliance Jio Infocomm ● OVH ● Alibaba ● DreamHost ● CERN CLOUD OPERATORS
  • 36. 36 ● Red Hat ● Mirantis ● SUSE ● ZTE ● China Mobile ● XSky ● Digiware ● Intel ● Kylin Cloud ● Easystack ● Istuary Innovation Group ● Quantum ● Mellanox ● H3C ● Quantum ● UnitedStack ● Deutsche Telekom ● Reliance Jio Infocomm ● OVH ● Alibaba ● DreamHost ● CERN HARDWARE AND SOLUTION VENDORS
  • 37. 37 ● Red Hat ● Mirantis ● SUSE ● ZTE ● China Mobile ● XSky ● Digiware ● Intel ● Kylin Cloud ● Easystack ● Istuary Innovation Group ● Quantum ● Mellanox ● H3C ● Quantum ● UnitedStack ● Deutsche Telekom ● Reliance Jio Infocomm ● OVH ● Alibaba ● DreamHost ● CERN ???
  • 38. 38 ● Mailing list and IRC – http://ceph.com/IRC ● Ceph Developer Monthly – first Weds of every month – video conference (Bluejeans) – alternating APAC- and EMEA- friendly times ● Github – https://github.com/ceph/ ● Ceph Days – http://ceph.com/cephdays/ ● Meetups – http://ceph.com/meetups ● Ceph Tech Talks – http://ceph.com/ceph-tech-talks/ ● ‘Ceph’ Youtube channel – (google it) ● Twitter – @ceph GET INVOLVED
  • 39. CEPHALOCON APAC 2018.03.22 and 23 BEIJING, CHINA
  • 40. THANK YOU! Sage Weil CEPH ARCHITECT sage@redhat.com @liewegas