Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sheepdog Status Report

Cover the recent sheepdog updates

  • Login to see the comments

  • Be the first to like this

Sheepdog Status Report

  1. 1. 11 Sheepdog Status Report Sheepdog Summit 2015 Liu Yuan
  2. 2. 22 Agenda Introduction - Sheepdog Overview Past and Now - Sheepdog Community Working In Progress – Problems and Solutions
  3. 3. 33 Sheepdog Overview Introduction
  4. 4. 44 • Distributed Object Storage System In User Space – Manage Disks and Nodes • Aggregate the capacity and the power (IOPS + throughput) • Hide the failure of hardware • Dynamically grow or shrink the scale – Secure Data • Provide redundancy mechanisms (replication and erasure code) for high- availability • Secure the data with auto-healing and auto-rebalanced mechanisms – Provide Interfaces (in a single cluster) • Virtual volume for QEMU VM, iSCSI TGT (Best supported) • RESTful container (Openstack Swift and Amazon S3 Compatible, in progress) • Storage for Openstack Cinder, Glance, Nova (in progress) • POSIX file via NFS (in progress) • Linux Block Device What is Sheepdog
  5. 5. 55 Gateway Store 1TB 1TB 1TB Gateway Store 1TB 1TB 2TB Gateway Store 1TB 2TB X Private Hash Ring: Local Rebalance Global Consistent Hash Ring and P2P Global Rebalance No meta servers!Zookeeper: membership management and message queue 4TB Hot-plugged Auto unplugged on EIO Disks and Nodes Management
  6. 6. 66 Data Management Sheep Sheep Sheep Full Replication Sheep Sheep Sheep Sheep Sheep Sheep Erasure Coding Parity
  7. 7. 77 Sheep Sheep Sheep Sheep Object LUN Volume File Openstack NFS HTTP iSCSI Glance Nova Cinder Block SBD Interfaces QEMU Sheepdog
  8. 8. 88 Use Patterns SD VM SD VM SD VM SD VM VM running inside Sheepdog Cluster SD SD SD SD SD SD HTTP HTTP object storage SD SD SD SD SD SD LUN device pool iSCSI backend Nginx
  9. 9. 99 Sheepdog Community Past and Now
  10. 10. 1010 Peoples Kazutaka Morita 2009.9 People from Taobao 2011.9 Christph Hellwig from Nebula 2012.4 More production uses from the world People from Intel 2014 People from China Mobile 2015 Stayed for around half the year Valerio, Andy, startups at China and Japan Add isa-l for Erasure code Open sourced the Sheepdog Add features, bug fixing, redesign Make sheepdog better
  11. 11. 1111 Patches 2009 2010 2011 2012 2013 2014 2015 0 200 400 600 800 1000 1200 Patches Per Year ● Culminate at 2012 and 2013, suffer a decline recently. ● It is always easier to open source the code, but build a community is really difficult. ● China Mobile is committed to release all its patches to the community.
  12. 12. 1212 Comparison with Ceph and GlusterFS Pros: The simplicity is the biggest advantage for Sheepdog Sheepdog: 20k+ lines in user space Ceph: 400k+ lines in user space and 20k+ in kernel GlusterFS: 330K+ lines in user space Cons: ● No company behind ● inactive community ● few users and few developers But Sheepdog is not technically inferior! Simplicity doesn't mean bad!
  13. 13. 1313 Sheepdog-ng Why? We forked it at May because of endless crashes, panics by our stressing test. I discussed with NTT guys with the redesign idea to remove shared states between sheep nodes. They asked me to fork Sheepdog instead simply because they don't use zookeeper as they always replied to a user with some features they don't use (e.g., object cache) The technical reason: Share nothing or share more and more state with overwhelming complexity. The non-technical reason: Community is not as friendly and open as before. We want to build a real community- based project. Subscribe the list: send email to
  14. 14. 1414 Problems and Solutions Working In Progress
  15. 15. 1515 iSCSI Target Scalability LUN1 LUN2 STGT sheep Main thread Max req == nr of workers Sync LUN1 LUN2 New Target sheep Unlimted! Async Thread per LUN Problems: ● OS tends to issue more and more request (blk-mp, scsi-mp) ● A single LUN can saturate stgt, not scale at all ● STGT take too much resource ● Multipath is not so good Solution – Rewrite ● from sync to async, less threads and Fds ● Tailored for sheepdog ● Add io rebalance and cache support New target
  16. 16. 1616 Performance Degradation X IO hang IO Resume Problem with default Dynamic Hash Ring ● If object is in recovery, we need to wait! ● What make it worse , recovery IO will complete with user IO for bandwidth, CPU ● Neither slow nor fast recovery is satisfied Solution – Static Hash Ring Failure of node won't change the hash ring.Trade data reliability for performance! We don't recover object if some of redundancy data are missing. Useful for small cluster with mostly deal with single node event. X Drop this IOSHR DHR
  17. 17. 1717 Live Patching A ----> B ----> C A B C B` After Patching B` is loaded by Linux's dynamic loader on the fly Sheep tracer Similar to Linux's Ftrace, virtually add constructor and destructor to every function. This mechanism relies on the 5 bytes space (A.K.A mcount) injected by GCC beforehand. Based on the tracer, we can replace any function in the sheep daemon on the fly. Useful for one-liner bug fixing but is limited on function level.
  18. 18. 1818 NFS Server Current status: Just a toy with file size < 4M, NFSv3 is not fully supported and virtually no file system code (need implement inode, dentry and free space management) Todos - finish stubs - add extent to file allocation - add btree or hash based kv store to manage dentries - implement a multi-threaded SUNRPC to take place of poor performance glibc RPC - implement NFS v4
  19. 19. 1919 Cinder - Block Storage – Support since day 1 Glance - Image Storage – Support merged at Havana version Nova - Ephemeral Storage – Not yet started Swift - Object Storage – Swift API compatible In progress Final Goal - Unified Storage – Copy-On-Write anywhere ? – Data dedup ? Sheep Sheep Sheep Sheep Cinder Glance Unified Storage NovaSwift Openstack Plan to rewrite the driver with
  20. 20. 2020 Enjoy yourself in Suzhou