Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
This document discusses an all-flash Ceph array design from QCT based on NUMA architecture. It provides an agenda that covers all-flash Ceph and use cases, QCT's all-flash Ceph solution for IOPS, an overview of QCT's lab environment and detailed architecture, and the importance of NUMA. It also includes sections on why all-flash storage is used, different all-flash Ceph use cases, QCT's IOPS-optimized all-flash Ceph solution, benefits of using NVMe storage, QCT's lab test environment, Ceph tuning recommendations, and benefits of using multi-partitioned NVMe SSDs for Ceph OSDs.
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
Optimizing Ceph performance by leveraging Intel Optane and 3D NAND TLC SSDs. The document discusses using Intel Optane SSDs as journal/metadata drives and Intel 3D NAND SSDs as data drives in Ceph clusters. It provides examples of configurations and analysis of a 2.8 million IOPS Ceph cluster using this approach. Tuning recommendations are also provided to optimize performance.
This document provides a community update on Ceph activities in 2017. It discusses several Ceph Days conferences scheduled around the world, metrics on community growth, the User Committee and governance efforts. It also outlines technical talks, performance discussions, developer meetings and the new Ceph.com site. The document highlights contributions from the China community including code commits, meetups with over 1,000 attendees, an upcoming Ceph white paper, new books and the first Ceph training and certification program.
Ceph Day Beijing - Storage Modernization with Intel and CephDanielle Womboldt
The document discusses trends in data growth and storage technologies that are driving the need for storage modernization. It outlines Intel's role in advancing the storage industry through open source technologies and standards. A significant portion of the document focuses on Intel's work optimizing Ceph for Intel platforms, including profiling and benchmarking Ceph performance on Intel SSDs, 3D XPoint, and Optane drives.
This document contains an agenda for a Ceph conference. It lists the scheduled time, duration, topic, and speakers for each presentation during the day. Talks will cover various topics related to optimizing Ceph performance using Intel technologies, all-flash array design using Ceph, global deduplication solutions for Ceph, and experiences deploying large-scale Ceph clusters at companies like Alibaba and China Mobile. The day includes keynotes, presentations, breaks for networking, and a closing session.
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Danielle Womboldt
This document discusses optimizing performance in large scale CEPH clusters at Alibaba. It describes two use models for writing data in CEPH and improvements made to recovery performance by implementing partial and asynchronous recovery. It also details fixes made to bugs that caused data loss or inconsistency. Additionally, it proposes offloading transaction queueing from PG workers to improve performance by leveraging asynchronous transaction workers and evaluating this approach through bandwidth testing.
Ceph Day Beijing - Small Files & All Flash: Inspur's works on Ceph Danielle Womboldt
Inspur has been working on improving Ceph for small file workloads and all-flash solutions. Some of the issues they are addressing include:
1) Too many connections between NAS and the file system occupying too many resources.
2) Problems with high workloads on RADOS objects and a single MDS.
3) Solutions being developed include multi-MDS, small file aggregation, OSD single process, BlueStore optimizations, deduplication, and optimizations to the I/O stack.
The document discusses using the Storage Performance Development Kit (SPDK) to optimize Ceph performance. SPDK provides userspace libraries and drivers to unlock the full potential of Intel storage technologies. It summarizes current SPDK support in Ceph's BlueStore backend and proposes leveraging SPDK further to accelerate Ceph's block services through optimized SPDK targets and caching. Collaboration is needed between the SPDK and Ceph communities to fully realize these optimizations.
Ceph is evolving its network stack to improve performance. It is moving from AsyncMessenger to using RDMA for better scalability and lower latency. RDMA support is now built into Ceph and provides native RDMA using verbs or RDMA-CM. This allows using InfiniBand or RoCE networks with Ceph. Work continues to fully leverage RDMA for features like zero-copy replication and erasure coding offload.
13. *
TCMU -- ring buffer 架构与现状
从代码可以看出,当前,ring buffer size固定为
(256+16)* 4096 = 1M + 64 K。
其中,
sizeof(Mailbox + Command Ring) = 64K,
sizeof( Data Area) = 1M。
Command Ring 是通过伪数组的形式进行管理,
每个元素具有 length 属性。
Data Area 是通过 bitmap 架构进行管理。
14. *
TCMU -- Ring 之 Data Area 优化
结合 TCMU & UIO 的框架实现,增加 Ring 的 Data Area 的动态调整特性。
针对于单个 Target Ring:
• Data Area 区域最大 size 限制为256 * 1024 Pages,4K page size 下为 1G。
• Data Area 区域初始 size 为 128 Pages,4K page size 下为 1M。
• Data Area 根据实际需求进行动态的 grow 或者 shrink。
介于系统本身内存的限制,如果不对 TCMU 下所有 Targets 的 Ring 的 D
ata Area 进行限制,则会造成系统内存的耗尽,解决方案:
• 对 TCMU 下所有 Ring 的 Data Area 区域大小总和进行限制,比如当前的2G,
然后建立一个缓冲池 GDAP(Global Data Area Pool)。
• 任何一个 Target Ring 的 Data Area 进行动态 grow 时候,会从GDAP中申请和
释放。
优化后效果:读写 IOPS 分别提升约 100 ~700%,应用压力越大,提升越明显。
以上方案的相关补丁目前已经进入内核主线。
15. *
TCMU -- Ring 之 Cmd Area 优化
Cmd Area 存放的相关 SCSI 命令头控制信息,相对于 SCSI 传输的 Data 数据大
小,所占空间是很小的,所以在设定最大 1G Data Area 情况下,根据估算
Cmd Area 不超过 8M(Cmd Area:Data Area ~= 8:1024),所以目前设定单
个 Target Cmd Area 大小为 8M。
8M Cmd Area 相对于大多数情况下,能够配合 1G Data Area 工作的很好,但
是针对于 SCSI 大量的小数据量的传输情况下,8M还是会造成性能瓶颈。
根据 SCSI 命令下 DMA Segments Scatter/Gatter <--> iovec 区域转换合并的特点,
对 Cmd Area 下 各个SCSI entry 进行瘦身优化,优化后能够使得每个 SCSI entry
节省约一半以上的内存空间,使得 8M 大小的 Cmd Area 显得绰绰有余。
优化后效果:节省 Cmd Area 约一半的内存。
以上瘦身优化补丁目前已经进入内核主线。