1
Ceph Project Update
Sage Weil & Josh Durgin
Ceph Month - 2021.06.01
2
● Ceph background
● Ceph Month
● Ceph Foundation update
● Sepia lab update
● Telemetry
● The future
AGENDA
3
The buzzwords
● “Software defined storage”
● “Unified storage system”
● “Scalable distributed storage”
● “The future of storage”
● “The Linux of storage”
WHAT IS CEPH?
The substance
● Ceph is open source software
● Runs on commodity hardware
○ Commodity servers
○ IP networks
○ HDDs, SSDs, NVMe, NV-DIMMs, ...
● A single cluster can serve object,
block, and file workloads
4
● Freedom to use (free as in beer)
● Freedom to introspect, modify,
and share (free as in speech)
● Freedom from vendor lock-in
● Freedom to innovate
CEPH IS FREE AND OPEN SOURCE
5
● Reliable storage service out of unreliable components
○ No single point of failure
○ Data durability via replication or erasure coding
○ No interruption of service from rolling upgrades, online expansion, etc.
● Favor consistency and correctness over performance
CEPH IS RELIABLE
6
● Ceph is elastic storage infrastructure
○ Storage cluster may grow or shrink
○ Add or remove hardware while system is
online and under load
● Scale up with bigger, faster hardware
● Scale out within a single cluster for
capacity and performance
● Federate multiple clusters across
sites with asynchronous replication
and disaster recovery capabilities
CEPH IS SCALABLE
7
CEPH IS A UNIFIED STORAGE SYSTEM
RGW
S3 and Swift
object storage
LIBRADOS
Low-level storage API
RADOS
Reliable, elastic, distributed storage layer with
replication and erasure coding
RBD
Virtual block device
CEPHFS
Distributed network
file system
OBJECT BLOCK FILE
8
RELEASE SCHEDULE
Octopus
Mar 2020
14.2.z
Nautilus
Mar 2019
WE ARE
HERE
15.2.z
16.2.z
Pacific
Mar 2021
17.2.z
Quincy
Mar 2022
● Stable, named release every 12 months
● Backports for 2 releases
○ Bug fixes and security updates
○ Nautilus reaches EOL shortly after Pacific is released
● Upgrade up to 2 releases at a time
○ Nautilus → Pacific, Octopus → Quincy
● Released as packages (deb, rpm) and container images
● Process improvements (security hotfixes; regular cadence)
9
CEPH MONTH - JUNE 2021
10
CEPH MONTH
● Goals
○ More interactive
○ Bite-sized
● Format
○ 1-2 hrs
○ ~2 blocks per week
○ A few planned talks
○ Un/semi-structured discussion time
○ Lighting talks sprinkled throughout
● Etherpads
○ Add your questions, or ask them verbally
○ Add any discussion topics
● Week of June 1 - 4
○ RADOS
○ Windows
● Week of June 7 - 11
○ RGW
○ Performance
● Week of June 14 - 18
○ RBD
○ Dashboard
○ Lighting talks
● Week of June 21 - 25
○ CephFS
○ cephadm
https://pad.ceph.com/p/ceph-month-june-2021
11
● It will be in March 2022…
● No location yet
○ Seoul?
○ North America? (Portland?)
○ ???
● Expected to be in-person
○ Possibly with hybrid elements?
● We are very interested in community feedback!
CEPHALOCON 2022
12
CEPH FOUNDATION UPDATE
PREMIER MEMBERS
GENERAL MEMBERS
ASSOCIATE MEMBERS
16
CURRENT PROJECTS
● Ceph documentation
○ Zac Dover, full-time technical writer
● ceph.io web site update
○ Spearheaded by SoftIron
○ Static site generator; github; no more wordpress
○ https://github.com/ceph/ceph.io
○ Planned launch next month!
● Training materials
○ Working with Linux Foundation’s training group
○ Building out initial free course material (w/ JC Lopez)
○ edX and/or LF hosted; can support both self-paced or instructor-led
○ Potential in future for advanced material, paid courses, and/or certifications
○ LF training group is revenue neutral; collaborative development process with community
17
CURRENT PROJECTS
● Reducing cloud spend with OVH
○ Build and CI hardware purchases for Sepia lab
○ We are now only hosting public-facing infra in OVH
● Lab hardware
○ Build machines
○ Expanding lab’s Ceph cluster (more storage for test results, etc)
● Windows support
○ Contract with CloudBase to finish initial development, build sustainable CI infrastructure
○ RBD, CephFS
● New marketing committee
18
SEPIA LAB UPDATE
● More hardware from the Ceph Foundation
○ Expanding the lab’s Ceph cluster
○ More build machines (braggi)
○ More test nodes (gibba)
● Improved teuthology test infrastructure
○ Moved to a single process dispatcher (Shraddha Agrawal)
○ Replaced in-memory queue with limited features with postgres (Aishwarya Mathuria)
○ Enables larger scale test clusters
○ Ability to prioritize and use lab more efficiently
● Downgrade testing (WIP)
○ Downgrade within a major release (e.g. 16.2.4 -> 16.2.3)
○ Now feasible with cephadm
19
ARM AARCH64 SUPPORT
● Hardware donated by Ampere
● CI builds for teuthology, releases
○ CentOS 8 RPMs, Ubuntu Focal 20.04
○ Container images (based on CentOS)
● Addressing some issues with bleeding edge of podman/quay and multi-arch
support
20
TELEMETRY UPDATE
https://telemetry-public.ceph.com/
21
TELEMETRY AND CRASH REPORTS
● Opt-in
○ Will require re-opt-in if telemetry content
is expanded in the future
○ Explicitly acknowledge data sharing
license
● Basic channel
○ Cluster size, version
○ Which features are enabled
● Crash channel
○ Anonymized crash metadata
○ Where in the code the problem happened,
what version, etc.
○ Extensive (private) dashboard
○ Integration into tracker.ceph.com WIP
● Device channel
○ HDD vs SSD, vendors, models
○ Health metrics (e.g., SMART)
○ Extensive dashboard (link from top right)
● Ident channel
○ Off by default
○ Optional contact information
● Future performance channel
○ Planned for quincy
○ Optional more granular (but still
anonymized) data about workloads, IO
sizes, IO rates, cache hit rates, etc.
○ Help developers optimize Ceph
○ Possibly tuning suggestions for users
● Transparency!
https://telemetry-public.ceph.com/
22
IS TELEMETRY ENABLED?
23
WHY IS TELEMETRY NOT ENABLED?
24
IT’S EASY!
● Review and opt-in
● Enable SOCKS proxy
● https://docs.ceph.com/en/latest/mgr/telemetry/
25
THE FUTURE...
26
● Cephadm has brought end-to-end management of Ceph deployments
● Cluster management via Ceph dashboard
● Simple experience for non-enterprise deployments
○ Small/medium businesses, remote offices, etc.
○ NAS replacement
● Turn-key support for NFS, object
○ SMB coming in Quincy
OUT OF THE BOX EXPERIENCE
27
NEW DEVICES
● ZNS SSDs
○ 3D NAND … dense, but the erase blocks are huge
○ Zone-based write interface
○ Combines capacity, low cost, and good performance
○ Key focus of Crimson’s SeaStore!
● Multi-actuator HDDs
○ Recent devices double IOPS in existing HDD package
○ Ceph treats them as two OSDs with shared failure domain
● Persistent memory
○ Will be well-supported (but not required) by Crimson
○ Recent support in RBD client-side write-back cache
28
● Client-side
○ NVMeoF target that presents an RBD device
○ Alternative to iSCSI
○ Can be combined with new hardware (e.g., SmartNICs like Nvidia’s Bluefield) to present a
NVME device on PCI bus while running gateway/librbd code on the card’s “DPU”
○ Useful for “metal as a service” cloud infrastructure
● Server-side
○ Some discussion around Crimson “phase 2”
○ Enable primary OSD to write directly to replica OSD’s devices
○ Mechanism to reduce CPU cost per IO
NVMe FABRICS
29
● Maturing
○ Rook
■ Key focus: Ceph orchestrator / dashboard integration with rook
○ Knative
○ Spark
■ S3 SELECT
○ Multisite
■ interop with public cloud
● New
○ Apache Arrow / Parquet
■ Data interchange formats for data pipelines
INTEGRATIONS / ECOSYSTEMS
30
https://pad.ceph.com/p/r
NAMING THE R RELEASE
Questions?
Up next: RADOS
https://pad.ceph.com/p/ceph-month-june-2021

2021.06. Ceph Project Update

  • 1.
    1 Ceph Project Update SageWeil & Josh Durgin Ceph Month - 2021.06.01
  • 2.
    2 ● Ceph background ●Ceph Month ● Ceph Foundation update ● Sepia lab update ● Telemetry ● The future AGENDA
  • 3.
    3 The buzzwords ● “Softwaredefined storage” ● “Unified storage system” ● “Scalable distributed storage” ● “The future of storage” ● “The Linux of storage” WHAT IS CEPH? The substance ● Ceph is open source software ● Runs on commodity hardware ○ Commodity servers ○ IP networks ○ HDDs, SSDs, NVMe, NV-DIMMs, ... ● A single cluster can serve object, block, and file workloads
  • 4.
    4 ● Freedom touse (free as in beer) ● Freedom to introspect, modify, and share (free as in speech) ● Freedom from vendor lock-in ● Freedom to innovate CEPH IS FREE AND OPEN SOURCE
  • 5.
    5 ● Reliable storageservice out of unreliable components ○ No single point of failure ○ Data durability via replication or erasure coding ○ No interruption of service from rolling upgrades, online expansion, etc. ● Favor consistency and correctness over performance CEPH IS RELIABLE
  • 6.
    6 ● Ceph iselastic storage infrastructure ○ Storage cluster may grow or shrink ○ Add or remove hardware while system is online and under load ● Scale up with bigger, faster hardware ● Scale out within a single cluster for capacity and performance ● Federate multiple clusters across sites with asynchronous replication and disaster recovery capabilities CEPH IS SCALABLE
  • 7.
    7 CEPH IS AUNIFIED STORAGE SYSTEM RGW S3 and Swift object storage LIBRADOS Low-level storage API RADOS Reliable, elastic, distributed storage layer with replication and erasure coding RBD Virtual block device CEPHFS Distributed network file system OBJECT BLOCK FILE
  • 8.
    8 RELEASE SCHEDULE Octopus Mar 2020 14.2.z Nautilus Mar2019 WE ARE HERE 15.2.z 16.2.z Pacific Mar 2021 17.2.z Quincy Mar 2022 ● Stable, named release every 12 months ● Backports for 2 releases ○ Bug fixes and security updates ○ Nautilus reaches EOL shortly after Pacific is released ● Upgrade up to 2 releases at a time ○ Nautilus → Pacific, Octopus → Quincy ● Released as packages (deb, rpm) and container images ● Process improvements (security hotfixes; regular cadence)
  • 9.
    9 CEPH MONTH -JUNE 2021
  • 10.
    10 CEPH MONTH ● Goals ○More interactive ○ Bite-sized ● Format ○ 1-2 hrs ○ ~2 blocks per week ○ A few planned talks ○ Un/semi-structured discussion time ○ Lighting talks sprinkled throughout ● Etherpads ○ Add your questions, or ask them verbally ○ Add any discussion topics ● Week of June 1 - 4 ○ RADOS ○ Windows ● Week of June 7 - 11 ○ RGW ○ Performance ● Week of June 14 - 18 ○ RBD ○ Dashboard ○ Lighting talks ● Week of June 21 - 25 ○ CephFS ○ cephadm https://pad.ceph.com/p/ceph-month-june-2021
  • 11.
    11 ● It willbe in March 2022… ● No location yet ○ Seoul? ○ North America? (Portland?) ○ ??? ● Expected to be in-person ○ Possibly with hybrid elements? ● We are very interested in community feedback! CEPHALOCON 2022
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
    16 CURRENT PROJECTS ● Cephdocumentation ○ Zac Dover, full-time technical writer ● ceph.io web site update ○ Spearheaded by SoftIron ○ Static site generator; github; no more wordpress ○ https://github.com/ceph/ceph.io ○ Planned launch next month! ● Training materials ○ Working with Linux Foundation’s training group ○ Building out initial free course material (w/ JC Lopez) ○ edX and/or LF hosted; can support both self-paced or instructor-led ○ Potential in future for advanced material, paid courses, and/or certifications ○ LF training group is revenue neutral; collaborative development process with community
  • 17.
    17 CURRENT PROJECTS ● Reducingcloud spend with OVH ○ Build and CI hardware purchases for Sepia lab ○ We are now only hosting public-facing infra in OVH ● Lab hardware ○ Build machines ○ Expanding lab’s Ceph cluster (more storage for test results, etc) ● Windows support ○ Contract with CloudBase to finish initial development, build sustainable CI infrastructure ○ RBD, CephFS ● New marketing committee
  • 18.
    18 SEPIA LAB UPDATE ●More hardware from the Ceph Foundation ○ Expanding the lab’s Ceph cluster ○ More build machines (braggi) ○ More test nodes (gibba) ● Improved teuthology test infrastructure ○ Moved to a single process dispatcher (Shraddha Agrawal) ○ Replaced in-memory queue with limited features with postgres (Aishwarya Mathuria) ○ Enables larger scale test clusters ○ Ability to prioritize and use lab more efficiently ● Downgrade testing (WIP) ○ Downgrade within a major release (e.g. 16.2.4 -> 16.2.3) ○ Now feasible with cephadm
  • 19.
    19 ARM AARCH64 SUPPORT ●Hardware donated by Ampere ● CI builds for teuthology, releases ○ CentOS 8 RPMs, Ubuntu Focal 20.04 ○ Container images (based on CentOS) ● Addressing some issues with bleeding edge of podman/quay and multi-arch support
  • 20.
  • 21.
    21 TELEMETRY AND CRASHREPORTS ● Opt-in ○ Will require re-opt-in if telemetry content is expanded in the future ○ Explicitly acknowledge data sharing license ● Basic channel ○ Cluster size, version ○ Which features are enabled ● Crash channel ○ Anonymized crash metadata ○ Where in the code the problem happened, what version, etc. ○ Extensive (private) dashboard ○ Integration into tracker.ceph.com WIP ● Device channel ○ HDD vs SSD, vendors, models ○ Health metrics (e.g., SMART) ○ Extensive dashboard (link from top right) ● Ident channel ○ Off by default ○ Optional contact information ● Future performance channel ○ Planned for quincy ○ Optional more granular (but still anonymized) data about workloads, IO sizes, IO rates, cache hit rates, etc. ○ Help developers optimize Ceph ○ Possibly tuning suggestions for users ● Transparency! https://telemetry-public.ceph.com/
  • 22.
  • 23.
    23 WHY IS TELEMETRYNOT ENABLED?
  • 24.
    24 IT’S EASY! ● Reviewand opt-in ● Enable SOCKS proxy ● https://docs.ceph.com/en/latest/mgr/telemetry/
  • 25.
  • 26.
    26 ● Cephadm hasbrought end-to-end management of Ceph deployments ● Cluster management via Ceph dashboard ● Simple experience for non-enterprise deployments ○ Small/medium businesses, remote offices, etc. ○ NAS replacement ● Turn-key support for NFS, object ○ SMB coming in Quincy OUT OF THE BOX EXPERIENCE
  • 27.
    27 NEW DEVICES ● ZNSSSDs ○ 3D NAND … dense, but the erase blocks are huge ○ Zone-based write interface ○ Combines capacity, low cost, and good performance ○ Key focus of Crimson’s SeaStore! ● Multi-actuator HDDs ○ Recent devices double IOPS in existing HDD package ○ Ceph treats them as two OSDs with shared failure domain ● Persistent memory ○ Will be well-supported (but not required) by Crimson ○ Recent support in RBD client-side write-back cache
  • 28.
    28 ● Client-side ○ NVMeoFtarget that presents an RBD device ○ Alternative to iSCSI ○ Can be combined with new hardware (e.g., SmartNICs like Nvidia’s Bluefield) to present a NVME device on PCI bus while running gateway/librbd code on the card’s “DPU” ○ Useful for “metal as a service” cloud infrastructure ● Server-side ○ Some discussion around Crimson “phase 2” ○ Enable primary OSD to write directly to replica OSD’s devices ○ Mechanism to reduce CPU cost per IO NVMe FABRICS
  • 29.
    29 ● Maturing ○ Rook ■Key focus: Ceph orchestrator / dashboard integration with rook ○ Knative ○ Spark ■ S3 SELECT ○ Multisite ■ interop with public cloud ● New ○ Apache Arrow / Parquet ■ Data interchange formats for data pipelines INTEGRATIONS / ECOSYSTEMS
  • 30.
  • 31.