Ceph Day Santa Clara: Ceph at DreamHost

1,304 views

Published on

Dallas Kashuba, Co-Founder of DreamHost walks through why they went with Ceph at the Santa Clara Ceph Day.

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,304
On SlideShare
0
From Embeds
0
Number of Embeds
23
Actions
Shares
0
Downloads
69
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Ceph Day Santa Clara: Ceph at DreamHost

  1. 1. Ceph at DreamHost A Storage Journey
  2. 2. About Me • One of the original four of DreamHost • Still active daily at DreamHost • Have spent a lot of time working on the Ops side.
  3. 3. • Hosting company founded in 1997 • Sage’s other company • shared hosting, virtual servers, dedicated servers, cloud storage, cloud computing • 375k customers, 1.3MM websites
  4. 4. Storage Journey A long strange trip
  5. 5. His name was Destro
  6. 6. ... and then there were more.
  7. 7. The First NetApp
  8. 8. Remote Failover
  9. 9. Remote Failover
  10. 10. Meanwhile...
  11. 11. ... and still more.
  12. 12. Lots of NetApps • Peak of around 125 individual NetApps • Smallish capacity on each (8TB) • Internal software continuously moving data between NetApps • Lots of time spent managing nearly full filers
  13. 13. Ideal
  14. 14. Reality
  15. 15. Hosting Landscape • Included storage had grown from 50MB to gigabytes, then terabytes. • Prices stayed the same. • Eventually went to unlimited Storage • Usage per customer skyrocketed.
  16. 16. Failed Experiments
  17. 17. Failed Experiments • ATAoE and XFS-based systems • Performance & Stability issues • 2006 era gear
  18. 18. Failed Experiments • High capacity • Nice features • Expensive • 85% full and it failed
  19. 19. Some Success • First on Sun hardware then Supermicro • Great stability • Not enough IO for front-line network storage
  20. 20. Back to Basics
  21. 21. Local RAID • SATA drives had grown in capacity and were very cheap • 4-6TB per hosting server • Less dependence on congested network • Smaller failure domains The Good
  22. 22. Local RAID • No more quota, too slow to scan filesystem • No more fast failovers • Multiple hour filesystem check with ext3 • More failure domains The Bad
  23. 23. Local RAID • Complete RAID loss more common than anticipated • Multiple days to fully restore from backup The Ugly
  24. 24. Storage Today Light at the end of the tunnel
  25. 25. Hybrid Mix • We learned something from every step of the way • No one size fits all when it comes to storage • Use whatever is best for the job • Be ready to change Best Tool For The Job
  26. 26. A Bit of Everything • Clustered NetApps and NFS for email • Local RAID in hosting servers • ZFS and OpenSolaris backup servers • Ceph for DreamObjects and DreamCompute Best Tool For The Job
  27. 27. • Object Storage, S3/Swift compatible • 2+ Petabytes raw storage • 3x replication, 900+ OSDs • RGW behind HAProxy • Row, rack, node and disk fault tolerant
  28. 28. • OpenStack-based Public Cloud • 3+ Petabytes raw storage • All storage is on Ceph RBD • Boot and Attachable Volumes • Nicira SDN + Ceph, Live Migration
  29. 29. HA Load Balancer MySQL / PostgreSQL Horizon Cockpit Pod Glance Keystone Nova Quantum Cinder Nicira NVP Glance Store (Ceph) OSMirrors (apt) Ceph Monitors Opscode Chef Logstash + Graphite Networking Gear 8x - Hypervisor Node 192 GB RAM 64AMD cores 14x - Storage Node 12x - 3TB disks Networking Gear Compute Pod 8x - Hypervisor Node 192 GB RAM 64AMD cores 14x - Storage Nodes 12x - 3TB disks Networking Gear Compute Pod 8x - Hypervisor Node 192 GB RAM 64AMD cores 14x - Storage Nodes 12x - 3TB disks Networking Gear Compute Pod Pods • 512 cores • 1.5TB of RAM • 504TB raw storage • 168TB redundant storage N etworking • ODM switches w/ Linux • 10Gbps everywhere • IPv6 from the ground up • Spine and leaf topology • 120 Gbps between pods (!) The Internets Thar be dragonshere! Nicira NVP Nicira NVP NiciraNVP
  30. 30. CephFS & The Future • The return of Failovers • No more backup servers • No more major disk-related outages • Fault tolerant low cost hosting Storage Panacea?
  31. 31. Thanks! @dallas dallas@dreamhost.com

×