Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

XSKY - ceph luminous update

582 views

Published on

講者:
王豪邁 (CTO, XSKY (星辰天合(北京)数据科技有限公司)暨 Ceph 傑出核心開發者)

概要:
XSKY為一家專注於軟體定義基礎設施及儲存的新創科技公司,近年來在中國IT界迅速竄起,擔任其CTO的王豪邁先生致力於開源儲存領域多年,於2013年加入Ceph社群,也是世界首位獲選Ceph官方傑出貢獻者頭銜的core developer, 這次Ceph Day Taiwan有幸請王先生遠道而來分享今年稍早推出的Ceph Luminous版本的最新進展,難得的交流機會,請大家多多把握~!

Published in: Technology
  • Be the first to comment

XSKY - ceph luminous update

  1. 1. CEPH LUMINOUS UPDATE XSKY Haomai Wang 2017.06.06
  2. 2. • Haomai Wang, active Ceph contributor • Maintain multi components • XSKY CTO, a China-based storage startup • haomaiwang@gmail.com/haomai@xsky.com Who Am I
  3. 3. • Hammer v0.94.x (LTS) – March '15 • Infernalis v9.2.x – November '15 • Jewel v10.2.x (LTS) – April '16 • Kraken v11.2.x – December '16 • Luminous v12.2.x (LTS) – September ‘17 (delay) Releases
  4. 4. Ceph Ecosystem
  5. 5. • BlueStore=Block+NewStore • Key/value database(RocksDB) for metadata • All data written directly to raw block device(s) • Fast on both HDDs (~2x) and SSDs (~1.5x) – Similar to FileStore on NVMe, where the device is not the bottleneck • Full data checksums (crc32c, xxhash, etc.) • Inline compression(zlib, snappy, zstd) • Stable and default RADOS - BlueStore
  6. 6. • requires BlueStore to perform reasonably • signicant improvement in effciency over 3x replication • 2+2 → 2x 4+2 → 1.5x • small writes slower than replication – early testing showed 4+2 is about half as fast as 3x replication • large writes faster than replication – less IO to device • implementation still does the “simple” thing – all writes update a full stripe RADOS – RBD Over Erasure Code
  7. 7. • ceph-mgr – new management daemon to supplement ceph-mon (monitor) – easier integration point for python management logic – integrated metrics • make ceph-mon scalable again – offload pg stats from mon to mgr – push to 10K OSDs (planned “big bang 3” @ CERN) • new REST API – pecan – based on previous Calamari API • built-in web dashbard CEPH-MGR
  8. 8. AsyncMessenger • AsyncMessenger – Core Library included by all components – Kernel TCP/IP driver – Epoll/Kqueue Drive – Maintain connection lifecycle and session – replaces aging SimpleMessenger – fixed size thread pool (vs 2 threads per socket) – scales better to larger clusters – more healthy relationship with tcmalloc – now the default!
  9. 9. DPDK Support • Built for High Performance – DPDK – SPDK – Full userspace IO path – Shared-nothing TCP/IP Stack(Seastar refer)
  10. 10. • RDMA backend – Inherit NetworkStack and implement RDMAStack – Using user-space verbs directly – TCP as control path – Exchange message using RDMA SEND – Using shared receive queue – Multiple connection qp’s in many-to-many topology – Built-in into ceph master – All Features are fully avail on ceph master • Support: – RH/centos – INFINIBAND and ETH – Roce V2 for cross subnet – Front-end TCP and back-end RDMA RDMA Support
  11. 11. Plugin Default Hardware Requirement Performance Compatible OSD Storage Engine Requirement OSD Disk Backend Requirement Posix(Kernel) YES None Middle TCP/IP Compatible None None DPDK+Userspace TCP/IP NO DPDK Supported NIC High TCP/IP Compatible BlueStore Must be NVME SSD RDMA NO RDMA Supported NIC High RDMA Supported Network None None Messenger Plugins
  12. 12. Recovery Improvements
  13. 13. RBD - iSCSI • TCMU-RUNNER + LIBRBD • LIO + Kernel RBD
  14. 14. RBD Mirror HA
  15. 15. RGW METADATA SEARCH
  16. 16. RGW MISC • NFS gateway – NFSv4 and v3 – full object access (not general purpose!) • dynamic bucket index sharding – automatic ( nally!) • inline compression • Encryption – follows S3 encryption APIs • S3 and Swift API odds and ends NFS- Client nfs-ganesha (nfs-v4) librgw-file RADOS NFS-Server RadosGW Apps rados api S3 API Swift API rados api RadosHandler
  17. 17. • multiple active MDS daemons ( nally!) • subtree pinning to specific daemon • directory fragmentation on by default • (snapshots still o by default) so many tests • so many bugs fixed • kernel client improvements CephFS
  18. 18. CephFS – MultiMDS
  19. 19. Container
  20. 20. • Rados – IO Path Refactor – BlueStore Peformance • QoS – dmclock • Dedup – based on Tiering • Tiering Future
  21. 21. Growing Developer Community
  22. 22. How To Help
  23. 23. Thank you

×