Successfully reported this slideshow.
Your SlideShare is downloading. ×

Crimson: Ceph for the Age of NVMe and Persistent Memory

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 24 Ad

Crimson: Ceph for the Age of NVMe and Persistent Memory

Download to read offline

Ceph is a mature open source software-defined storage solution that was created over a decade ago.
During that time new faster storage technologies have emerged including NVMe and Persistent memory.
The crimson project aim is to create a better Ceph OSD that is more well suited to those faster devices. The crimson OSD is built on the Seastar C++ framework and can leverage these devices by minimizing latency, cpu overhead, and cross-core communication. This talk will discuss the project design, our current status, and our future plans.

Ceph is a mature open source software-defined storage solution that was created over a decade ago.
During that time new faster storage technologies have emerged including NVMe and Persistent memory.
The crimson project aim is to create a better Ceph OSD that is more well suited to those faster devices. The crimson OSD is built on the Seastar C++ framework and can leverage these devices by minimizing latency, cpu overhead, and cross-core communication. This talk will discuss the project design, our current status, and our future plans.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Crimson: Ceph for the Age of NVMe and Persistent Memory (20)

Advertisement

More from ScyllaDB (20)

Recently uploaded (20)

Advertisement

Crimson: Ceph for the Age of NVMe and Persistent Memory

  1. 1. Brought to you by Crimson: Ceph for the Age of NVMe and Persistent Memory Orit Wasserman Sr. Principal Software Engineer at
  2. 2. Orit Wasserman Architect at Red Hat ■ Currently focusing on OpenShift Data Foundation and Ceph ■ 25+ year developing low level programming, cloud infrastructure and storage ■ 10+ working on Open Source ■ Work from home before COVID
  3. 3. Storage Keeps Getting Faster
  4. 4. CPUs, not so much With a cpu clocked at 3ghz you can afford: - HDD: ~20 million cycles/IO - SSD: 300,000 cycles/IO - NVME: 6000 cycles/IO
  5. 5. ■ Open source project ■ Software Defined Storage ■ Runs on commodity hardware ● Commodity servers ● IP networks ● HDDs, SSDs, NVMe, NV-DIMMs, … ■ A single cluster can serve object, block, and file ■ Highly Available and resiliency with self healing ■ Scalable Ceph
  6. 6. RGW S3 compatible Object storage gateway LIBRADOS Low-level storage API RADOS Reliable, elastic, distributed storage layer with replication and erasure coding RBD Virtual Block device CEPHFS Distributed network file system OBJECT BLOCK FILE Ceph Architecture
  7. 7. Monitor ■ Central authority for authentication, data placement, policy ■ Coordination point for all other cluster components ■ Protect critical cluster state with Paxos ■ 3, 5 or 7 per cluster ceph-osd M ceph-mon Ceph Rados Components OSD (Object Storage Daemon) ■ Stores data on an HDD or SSD ■ Services client IO requests ■ Cooperatively peers, replicates, rebalances data ■ 10s-1000s per cluster
  8. 8. APPLICATION LIBRADOS 2 0 DATA OBJECT ■ Get map of cluster layout (num OSDs etc) on startup ■ Calculate correct object location based on its name ■ Read from or write to appropriate OSD 1 M M M CRUSH: Calculated Placement
  9. 9. Async Messenger Worker Thread Worker Thread Worker Thread PG State Worker Thread Worker Thread Worker Thread BlueStore Worker Thread Worker Thread Worker Thread osd op queue kv queue Async Messenger Worker Thread Worker Thread Worker Thread out_queue OSD Threading Architecture
  10. 10. Classic OSD Threads
  11. 11. Classic OSD Threads
  12. 12. Classic OSD Threads
  13. 13. Classic OSD Threads
  14. 14. Crimson crimson-osd aims to be a replacement osd daemon with the following goals: ■ Minimize cpu overhead ● Minimize cycles/iop ● Minimize cross-core communication ● Minimize copies ● Bypass kernel, avoid context switches ■ Enable emerging storage technologies ● Zoned Namespaces ● Persistent Memory ● Fast NVME
  15. 15. Seastar ■ Single reactor thread per cpu ■ Asynchronous IO ■ Scheduling done in user space ■ Includes direct support for DPDK, a high performance library for userspace networking
  16. 16. for (;;) { auto req = connection.read_request(); auto result = handle_request(req); connection.send_response(result); } repeat([conn, this] { return conn.read_request().then([this](auto req) { return handle_request(req); }).then([conn] (auto result) { return conn.send_response(result); }); }); Programming Model
  17. 17. Seastar Reactor Thread Messenger PG ObjectStore async wait async wait Messenger async wait Seastar Reactor Thread Messenger PG ObjectStore async wait async wait Messenger async wait Seastar Reactor Thread Messenger PG ObjectStore async wait async wait Messenger async wait Seastar Reactor Thread Messenger PG ObjectStore async wait async wait Messenger async wait Seastar Reactor Thread Async Messenger PG state BlueStore async wait async wait Async Messenger async wait Crimson Threading Architecture
  18. 18. Crimson Threading Architecture ■ PG state is handled by a single OSD ■ First phase: single reactor thread per OSD (per core) ■ Second phase: Multi reactor threads per OSD ● Wip ● Route request to different reactors ● Limit moving requests between cores as possible
  19. 19. Performance
  20. 20. Allocator data metadata RocksDB SeastarEnv BlueFS* Seastar AIO BlueStore - Seastar Native
  21. 21. alien threads seastar reactor OSD 1.0 1.2 1.4 1.6 1.8 BlueStore - Alien
  22. 22. Status and Plans ■ Tech Preview in Ceph Quency (2022 Q1): ● Replication ● Peering/Recovery/Backfill ● BlueStore support (via alien) ● RBD workloads ● Multi reactor threads (multi core) ■ Near future: ● RGW and CephFS support ● Erasure Coding Support ● Snapshots ■ Far Future ● Seastore ● SPDK and/or io_uring ● Networking
  23. 23. ■ Roadmap: https://github.com/ceph/ceph-notes/blob/master/crimson/status.rst ■ Project Tracker: https://trello.com/b/lbHkWKxh/crimson ■ Documentation: https://github.com/ceph/ceph/blob/master/doc/dev/crimson.rst ■ Seastar tutorial: https://github.com/scylladb/seastar/wiki/Seastar-Tutorial Questions?
  24. 24. Brought to you by Orit Wasserman owasserm@redhat.com @oritwas owasserm@re dhat.com

×