Ceph Storage and Penguin Computing on Demand

612 views

Published on

Presentation to Ceph Days Santa Clara on 9/12/2013

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
612
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ceph Storage and Penguin Computing on Demand

  1. 1. Ceph and Penguin Computing On Demand Travis Rhoden
  2. 2. Copyright © 2013 Penguin Computing, Inc. All rights reserved Who is Penguin Computing? ● Founded in 1997, with a focus on custom Linux systems ● Core markets: HPC, enterprise/data-center, HPC cloud services – We see about a 50/50 mix between HPC and enterprise orders – Offer turn-key clusters, full-range of Linux servers ● Now the largest private System Integrator in North America ● Stable, profitable, growing...
  3. 3. Copyright © 2013 Penguin Computing, Inc. All rights reserved What is Penguin Computing On Demand (POD)? ● POD launched in 2009 as an HPC-as-a-Service offering ● Purpose-built HPC cluster for on-demand customers – Offers low-latency interconnects, high core counts, plentiful RAM for processing – Non-virtualized compute resources, focused on absolute compute performance – Tuned MPI/cluster stack available “out of the box” ● “Pay as you go” – only for what you use, charge per core-hour ● Customizable, persistent user environment ● Over 50 million commercial jobs run
  4. 4. Copyright © 2013 Penguin Computing, Inc. All rights reserved Original POD designs ● Original clusters used standalone DAS NFS servers ● Login nodes ran on VMWare, then KVM, stored locally on host
  5. 5. Copyright © 2013 Penguin Computing, Inc. All rights reserved Original POD limitations ● Disparate NFS servers led to non-global namespace – Users unable to take advantage of all installed storage – Not all disks able to contribute to performance (no scale-out effect) – A full NFS server affected co-resident users – NFS server RAID card a SPoF ● Never lost data, but did have times where data was inaccessible ● VM login nodes were handled by a standalone set of hardware – Storage servers not leveraged for hosting VM disks
  6. 6. Copyright © 2013 Penguin Computing, Inc. All rights reserved POD New Architecture ● Time for something different – More expandable – More fault tolerant – More flexible ● OpenStack & Ceph
  7. 7. Copyright © 2013 Penguin Computing, Inc. All rights reserved POD Ceph Usage – Open Stack ● Ceph OpenStack integration is a big plus – Store Disk images in Ceph (Glance) – Store Volumes in Ceph (Cinder) – Boot VMs straight from Ceph (boot from volume) – Leverage COW semantics for boot volume creation – Live migration ● No immediate need for RADOSGW – Nice to know it's there if we need it
  8. 8. Copyright © 2013 Penguin Computing, Inc. All rights reserved POD Ceph Usage - RBD ● The same storage system hosts RBDs for us ● Each POD user has their $HOME in an RBD – To make visible to all compute nodes and customer-accessible login nodes, we mount the RBD on one of several NFS servers and export from there – Aren't quite ready to throw full weight into CephFS, but early testing has started – We know this creates a performance bottleneck, but the pros outweigh the cons
  9. 9. Copyright © 2013 Penguin Computing, Inc. All rights reserved POD Ceph Usage – RBD Pros and Cons ● Pros – Thin provisioning – User specific backups or snapshots – Nice block device to export 1:1 mapping ● Cons – NFS server SPoF and bottleneck – Loss of parallel access to OSDs – Slow-ish resize
  10. 10. Copyright © 2013 Penguin Computing, Inc. All rights reserved POD Storage Hardware ● Started with 5x Penguin Computing IB2712 chassis – Dual Xeon 5600-series – 48GB RAM – Dual 10GbE – 12x hot-swap 3.5” SATA drives – 2x internal SSDs for OS and OSD journals ● 6 journals on each SSD ● 60x 2TB → 120TB raw storage – 109TB available in Ceph ● XFS on OSDs
  11. 11. Copyright © 2013 Penguin Computing, Inc. All rights reserved POD Ceph Storage Config ● Running 3 monitors – On same chassis as OSDs (not recommended by Inktank) ● Running 2 MDS processes – On same chassis as OSDs – 1 active, 1 backup ● Each chassis has a 2-port 10GbE LAG to ToR switch ● 2 replicas ● Separate pools for Glance, Cinder, user $HOMEs
  12. 12. Copyright © 2013 Penguin Computing, Inc. All rights reserved CephFS on POD ● Primary use case for storage on POD is users reading and writing data to their $HOME directory ● On our HPC clusters, primarily tends to be sequential writes, but we see sequential reads and some bits of random I/O ● Running VMs also produce random I/O ● Since users can run jobs comprising dozens of compute nodes, potentially all hitting the same folder(s), would be nice to use CephFS rather than NFS ● Testing a scratch space a good way to start ● Using ceph-fuse, as Cluster is CentOS 6.3
  13. 13. Copyright © 2013 Penguin Computing, Inc. All rights reserved CephFS initial benchmarks ● Simple dd, 1GB file, 4MB blocks – (dd if=/dev/zero of=[dir] bs=4M count=256 conv=fdatasync)
  14. 14. Copyright © 2013 Penguin Computing, Inc. All rights reserved Ceph Lessons Learned ● Our 3rd production Ceph cluster – 1st has been decommissioned, ran Argonaut and Bobtail, used IPoIB – 2nd being decommissioned, still running Bobtail – 3rd is the primary workhorse for a production POD cluster, launched on Bobtail, now running the latest Cuttlefish ● For RBD, very recent Linux kernel a must if using kclient – Pre 3.10 had kpanic issues when using cephx ● SSDs nice, but may not be best bang for buck – 3-4 OSD journals per SSD is ideal, but does add significant cost – We've seen promising results using higher end RAID controllers in lieu of SSDs, due to write-back cache, at an overall lower cost – We still need to test more to determine how this behavior caries over sequential vs random, and small vs large I/O. ● Need to work hard to balance density versus manageable failure domains – Density very popular, but leads to a lot of recovery traffic if server fails
  15. 15. Copyright © 2013 Penguin Computing, Inc. All rights reserved Thanks! @off_rhoden trhoden@penguincomputing.com @PenguinHPC

×